diff --git a/404.html b/404.html
index b21619d..d41fde0 100644
--- a/404.html
+++ b/404.html
@@ -4,7 +4,7 @@
-
+
@@ -95,7 +95,7 @@
"https://www.linkedin.com/in/julienaraemy/"
]
},
- "datePublished": "Thu Feb 06 2025 14:04:50 GMT+0000 (Coordinated Universal Time)",
+ "datePublished": "Thu Feb 06 2025 14:20:54 GMT+0000 (Coordinated Universal Time)",
"description": "",
"keywords": ""
}
diff --git a/data/acronyms.json b/data/acronyms.json
index c8d3685..7470f43 100644
--- a/data/acronyms.json
+++ b/data/acronyms.json
@@ -144,6 +144,7 @@
"SLMS": "System-LoA-Model-Structure",
"SNA": "Social Network Analysis",
"SNSF": "Swiss National Science Foundation",
+ "SPARQL": "SPARQL Protocol and RDF Query Language",
"STS": "Science and Technology Studies",
"TB": "Terabytes",
"TBox": "Terminological Box",
diff --git a/feed.json b/feed.json
index fae2627..53b71b5 100644
--- a/feed.json
+++ b/feed.json
@@ -51,7 +51,7 @@
"id": "https://phd.julsraemy.ch/thesis.html",
"url": "https://phd.julsraemy.ch/thesis.html",
"title": "Linked Open Usable Data for Cultural Heritage: Perspectives on Community Practices and Semantic Interoperability",
- "content_html": "Linked Open Usable Data for Cultural Heritage: Perspectives on Community Practices and Semantic Interoperability PhD Thesis in Digital Humanities, completed as part of the Graduate School of Social Sciences’ (G3S) doctoral programme. It was successfully defended on 18 November 2024 (slides). This page will host a lightweight HTML version of my thesis, optimised for easy access and readability. The PDF version (e-dissertation) is available on the University of Basel’s repository: https://doi.org/10.5451/unibas-ep96807. Page in construction (please be patient ⌛) Author Dr. Julien A. Raemy (University of Basel) https://orcid.org/0000-0002-4711-5759 Supervisors Prof. Dr. Peter Fornaro (University of Basel) https://orcid.org/0000-0003-1485-4923 Prof. Dr. Walter Leimgruber (University of Basel) Dr. Robert Sanderson (Yale University) https://orcid.org/0000-0003-4441-6852 Abstract Digital technologies have fundamentally transformed how Cultural Heritage (CH) collections are accessed and engaged with. Linked Open Usable Data (LOUD) specifications, including the International Image Interoperability Framework (IIIF) Presentation API 3.0, Linked Art, and the W3C Web Annotation Data Model, have emerged as web standards to facilitate the description and dissemination of these valuable resources. Despite the widespread adoption of IIIF, implementing LOUD specifications, particularly in combination, remains challenging. This is especially evident in the development and assessment of infrastructures, or sites of assemblage, that support these standards. This research is guided by two perspectives: community practices and semantic interoperability. The first perspective assesses how organizations, individuals, and apparatuses engage with and contribute to the consensus-making processes surrounding LOUD. By examining these practices, the social fabrics of the LOUD ecosystem can be better understood. The second perspective focuses on making data meaningful to machines in a standardized, interoperable manner that promotes the exchange of well-formed information. This research is grounded in the SNSF-funded project, Participatory Knowledge Practices in Analogue and Digital Image Archives (PIA) (2021–2025), which aims to develop a citizen science platform for three photographic collections from the Cultural Anthropology Switzerland (CAS) archives. Actor-Network Theory (ANT) forms the theoretical foundation, aiming to describe the collaborative structures of the LOUD ecosystem and emphasize the role of non-human actors. Beyond its implementation within the PIA project, this research includes an analysis of the social dynamics within the IIIF and Linked Art communities and an investigation of Yale’s Collections Discovery platform, LUX. The research identifies socio-technical requirements for developing specifications aligned with LOUD principles. It also examines how the implementation of LOUD standards in PIA highlights their potential benefits and limitations in facilitating data reuse and broader participation. Additionally, it explores Yale University’s large-scale deployment of LOUD standards, emphasizing the importance of ensuring consistency between Linked Art and IIIF resources within the LUX platform for the CH domain. The core methodology of this thesis is an actor- and practice-centered inquiry, focusing on a detailed examination of specific cosmologies within LOUD-driven communities, PIA, and LUX. This micro-perspective approach provides rich empirical evidence to unravel the intricate web of cultural processes and constellations in these contexts. Key empirical findings indicate that LOUD enhances the discoverability and integration of data in CH, requiring community-driven consensus on model interoperability. However, significant challenges include engaging marginalized groups, sustaining long-term participation, and balancing technological and social factors. Strategic use of technology and the capture of digital materiality are critical, but LOUD also poses challenges related to resource investment, data consistency, and the broader implementation of complex patterns. LOUD should lead efforts to improve the accessibility and usability of CH data. The community-driven methodologies of IIIF and Linked Art inherently foster collaboration and transparency, making these standards essential tools in evolving data management practices. Even for institutions and projects that do not adopt these specifications, the socio-technical practices of LOUD offer vital insights into effective digital stewardship and strategies for community engagement. Keywords: Actor-Network Theory; Community of Practice; Cultural Anthropology Switzerland; Cultural Heritage; Digital Infrastructure; International Image Interoperability Framework; Knowledge Practices; Linked Art; Linked Data; LUX; Participatory Archives; Photographic Archives; Semantic Interoperability; Web Annotation Data Model Table of Contents Introduction Context Interlinking Cultural Heritage Data Exploring Relationships through an Actor-Network Theory Lens Research Scope and Methodology The Social Fabrics of IIIF and Linked Art PIA as a Laboratory Yale’s LUX and LOUD Consistency Discussion Conclusion 1. Introduction Since its inception in 2011, the IIIF has revolutionised[1] the accessibility of image-based resources. Initially driven by the needs of manuscript scholars, IIIF focused on two-dimensional images, but has since expanded to encompass a wide range of image-based resources, including audiovisual materials and, in the near future, 3D images. Similarly, Linked Art, formally established in 2017, initially concentrated on art museum objects but has since broadened its scope to model a variety of CH entities, leveraging CIDOC-CRM, a renowned ontology in the museum and DH space. Both initiatives aim to break down silos: IIIF focuses on improving the presentation of digital objects, while both initiatives enhance their dissemination. Together, they make CH data more accessible through IIIF and more meaningful through Linked Art for machines. These efforts have primarily benefited the CH domain. A key commonality is that the main APIs these communities create align with the LOUD design principles, either intentionally or empirically demonstrated through use cases. These principles enable software developers to develop compliant tools and services without needing to fully understand RDF, a syntax for representing information on the web. Additionally, they may not need to grasp all LOD principles, which promote the interlinking of data from diverse datasets using tools like KOS such as thesauri. WADM, a W3C standard, is also recognised as a LOUD specification. It provides a framework for creating interoperable annotations on web resources, facilitating the linking and sharing of data across different platforms and applications. These LOUD design principles include the right abstraction for the audience, few barriers to entry, comprehensibility by introspection, documentation with working examples, and the use of many consistent patterns rather than few exceptions. Additionally, both IIIF and Linked Art are driven by vibrant communities, mainly comprising GLAM and higher education institutions. While the standards and principles discussed have broad applications, it is important to clarify the scope of this dissertation. This work does not focus on KGs by assessing triplestores – databases specifically designed to store and retrieve triples, which are the fundamental data structures in RDF. Similarly, it does not deal with evaluating SPARQL engines, which are specifically designed to query KGs. Additionally, this dissertation does not address the intersection of ML and IIIF, or the ontological reasoning of Linked Art. Instead, this dissertation concentrates on LOUD, the consistency of its standards, design principles and the vibrant communities behind it. It examines JSON-LD serialisation efforts and the crucial intersection required to establish robust semantic interoperability baselines between presentation and semantic layers. It also presents real-world use case implementations, both on a small scale in a laboratory and flexible space within the PIA research project, and on a large scale at Yale, exemplified by the LUX platform that provides access to (meta)data from YUL, YCBA, YUAG, and YPM. The focus is therefore on digital infrastructures capable of delivering JSON-LD files from the above specifications, which are primarily, though not exclusively, CH resources. It is more about the different actors – both human and non-human – that create and maintain these interconnected systems and the dynamic interactions that sustain them. The deployment of various LOUD specifications addresses the need for semantic interoperability between CH resources and disparate datasets by establishing a standardised approach to representing and linking data, ensuring that information can be seamlessly shared and understood across different platforms and contexts. This dissertation seeks to carve out a distinct niche by addressing an often-overlooked aspect of IIIF and Linked Art. IIIF is sometimes perceived and studied merely as a service or an appendix, with the content it delivers taking precedence. However, this PhD thesis positions IIIF as a first-class citizen worthy of in-depth study. Similarly, Linked Art, despite its potential and its relatively recent establishment, has been the subject of very few scholarly papers. This gap underscores the significance of LOUD in this context. Furthermore, this thesis elevates Linked Art to a position of primary importance, recognising its significance and advocating for its thorough examination. To thoroughly study LOUD and its adherence to design principles, it is essential to immerse ourselves actively in both communities – an approach I have embraced for years. The thesis also emphasises the importance of participatory efforts and collaboration between research projects, which typically have shorter lifespans, and memory institutions, which need to implement technical standards as a lingua franca. In doing so, it reveals the mediating role of LOUD in advancing the heritage sphere. To truly understand IIIF, Linked Art, and to a lesser extent WADM, it is crucial to examine the social fabrics and consensus decision-making of each community. Among these considerations are how the specifications can be implemented pragmatically, and how the standards can support the implementation and maintenance of more extensive semantic interoperability efforts. The significance of this research lies in highlighting the commitment and diligence of the individuals and organisations that make up both the IIIF and Linked Art communities. It aims to demonstrate that community-driven practices, such as those exemplified by IIIF and Linked Art, have a potential that goes beyond the mere sharing of digital objects and their associated metadata. The more people who embrace these approaches and implement the associated specifications, the more society as a whole will benefit. Furthermore, this research illustrates that IIIF is no longer limited to two-dimensional images, that Linked Art is not restricted to artworks, and that WADM is a simple, content-agnostic standard that can be easily integrated into a range of systems. This adaptability is a strength of LOUD standards, which are designed to be simple yet effective. LOUD can serve a variety of purposes, primarily rooted in CH, but with the potential to extend its benefits to other sectors. The true beauty of LOUD lies in its ability to foster networking opportunities and transparent socio-technical practices, demonstrating its value beyond mere technical implementation. By emphasising these aspects, this dissertation highlights the wider impact of LOUD in promoting semantic interoperability and enhancing collaborative efforts within the heritage field and beyond. In addition, the implementation of standards through PIA underlines the potential for similar participatory or citizen science projects, while the LUX initiative serves as an illustrative example of robust infrastructure and cross-unit engagement. These examples demonstrate the practical applications and far-reaching implications of adopting LOUD standards in different contexts. This dissertation is structured across ten chapters, each building upon the previous ones up to Chapter 5 to provide a comprehensive understanding of the research. These initial chapters lay the foundation of the study, establishing the context, theoretical framework, and methodological approaches. After this foundational section, Chapters 6, 7, and 8 present empirical studies that, while interconnected, can be read independently if desired. These chapters offer detailed insights into specific aspects of the research and can be appreciated on their own or as part of the broader narrative. The thesis continues with Chapter 2, which extends this introduction by providing more information about the research setting, specifically PIA. Chapter 3 follows with an extensive literature review, offering a comprehensive overview of methods to interlink CH data. Next, Chapter 4 presents the theoretical framework, conceptualised as a toolbox and firmly rooted in ANT, guiding the analysis and discussion throughout the dissertation. Following this, Chapter 5 details the research scope and methodology, explaining the approaches and methods employed in the study. Moving on to the empirical work, Chapter 6 sheds light on the social fabrics of IIIF and Linked Art, exploring the communities and practices that underpin these initiatives.Chapter 7 then examines the implementation of LOUD standards within PIA, highlighting the practical aspects and challenges encountered. This is followed by Chapter 8, which focuses on the LUX initiative at Yale, examining the underlying governance and interdepartmental ownership of the Yale Collections Discovery platform. The discussion of findings is presented in Chapter 9, where the results from the empirical chapters are synthesised and analysed in relation to the theoretical framework. Finally, Chapter 10 concludes the thesis, summarising the key insights and contributions of the research while outlining potential directions for future study. 2. Context In this chapter, I will set the stage for my PhD thesis by providing important background information. First, in Section 2.1, I will explain why I chose the title for my thesis. This will give you an understanding of the main focus and the direction of my research. Next, in , I will describe the PIA research project, which is central to my work. This section will cover the project’s goals, significance, and overall framework. In , I will detail my specific contributions to the PIA project. I will emphasise how my work fits into the larger project and its importance to my thesis. Finally, in , I will talk about my active participation in the IIIF and Linked Art communities. This section will highlight how my involvement in these communities has influenced my research and its broader implications. 2.1 PhD Title I chose the title ‘Linked Open Usable Data for Cultural Heritage: Perspectives on Community Practices and Semantic Interoperability’ as it encapsulates the essence of my research focus but I could have indeed chosen other ones. During the initial stages of my research, multiple working titles were explored to capture the diverse facets of my interests and objectives. If I was quite sure about having in the title after the third iteration, I was quite unsure of what should follow and if a subtitle was actually needed at all. Amidst this dynamic progression, the underlying theme of my research remained steadfast – to delve into the transformative potential of LOUD for CH. I also opted to maintain in the title of my thesis subsection. While holds its appeal, my choice reflects a broader narrative that acknowledges the crucial role of CHIs and spotlighting the multifaceted nature of heritage preservation, encapsulating both its digital facets and the essential contribution of individuals and institutions in curating, interpreting, and making heritage accessible. As for the subtitle, while I do explore CoP as defined by @lave_situated_1991 and @wenger_communities_2011 through investigating the social fabrics of the IIIF and Linked Art communities, my main interest lies in the broader application of LOUD for describing and interlinking CH resources. Thus, I decided to opt for the more generic as the first axis or perspective. For the second perspective, I wanted to see how semantic interoperability can be achieved through standards adhering to the LOUD design principles, as they seem to be key enablers for seamless collaboration and knowledge exchange among practitioners. There was a time in my research when I envisaged decoupling and , perceiving them as two distinct dimensions. However, what really captivates me is the unification of these factors to facilitate collective reasoning for both humans and machines. In summary, this title reflects my enthusiasm for using web-based and community-driven technologies to transform the way we understand, share and value CH. 2.2 The PIA Research Project I undertook my doctoral studies within the scope of the PIA research project financed by the SNSF under their Sinergia funding scheme from February 2021 to January 2025[2]. The project aimed to analyse the interplay of participants, epistemological orders and the graphical representation of information and knowledge in relation to three photographic collections from CAS. It sought to bring together the world of data and things in an interdisciplinary manner, exploring the phases of the analogue and digital archive from a cultural anthropological, technical and design research perspective [@felsing_community_2023 p. 42]. As part of this endeavour, interfaces were developed to enable the collaborative indexing and use of photographic archival records [@chiquet_participatory_2023 p. 110]. I discuss in more detail the interdisciplinary components and briefly introduce the people involved in the project in Subsection 2.2.1, then talk about the photographic collections that were the overarching narrative of the research in Subsection 2.2.2, and lastly in Subsection 2.2.3, the vision that we had put together. The project, divided in three interdisciplinary teams, was led by the University of Basel through the Institute for Cultural Anthropology and European Ethnology[3] (Team A) and the DHLab[4] in collaboration with the DBIS group (Team B) as well as by the HKB[5], an art school and department of the Bern University of Applied Sciences (Team C) [@felsing_community_2023 p. 43]. Table 2.1 lists the people who contributed to the project, broken down by the three teams and their particular perspectives. Table 2.1: PIA Team Core Members Perspective People A) Anthropological Prof. Dr. Walter Leimgruber, Team Leader and Dissertation Supervisor Dr. Nicole Peduzzi, Photographic Restoration and Digitisation Supervisor Regula Anklin, Conservation and Restoration Specialist (project partner at Anklin & Assen) Murielle Cornut, PhD Candidate in Cultural Anthropology Birgit Huber, PhD Candidate in Cultural Anthropology Fabienne Lüthi, PhD Candidate in Cultural Anthropology B) Technical Prof. Dr. Peter Fornaro, Team Leader and Dissertation Supervisor Prof. Dr. Heiko Schuldt, Dissertation Supervisor (project partner at the University of Basel) Dr. Vera Chiquet, Postdoctoral Researcher Adrian Demleitner, Software Developer (2021-2023) Fabian Frei, Software Developer (2023-2025) Christoph Rohrer, Software Developer (2023-2025) Julien A. Raemy, PhD Candidate in Digital Humanities Florian Spiess, PhD Candidate in Computer Science C) Communicative Dr. Ulrike Felsing, Team Leader and Dissertation Supervisor Prof. Dr. Tobias Hodel, Dissertation Supervisor (project partner at the University of Bern) Daniel Schoeneck, Research Fellow Lukas Zimmer, Designer (project partner at A/Z&T) Max Frischknecht, PhD Candidate in Digital Humanities 2.2.2 Photographic Collections/Archives as Anchors CAS has historically been engaged in active collaborations that bridge the academic research and the public sphere, primarily through traditional analogue methods. The PIA project was created with the intention of exploring the complexities inherent in both analogue and digital approaches, and to encourage and investigate these collaborative endeavours between academia and the wider public. As such, PIA represents a paradigm shift within the scope of projects associated with or supported by CAS, facilitating the seamless integration of digital tools to explore multiple facets of participation and engagement. This transformative endeavour embodies a profound exploration of new intersections where scholarly endeavours intertwine with the active involvement of citizens. PIA drew on three collections: one focusing on scientific cartography and titled (Atlas der Schweizerischen Volkskunde), a second from the estate of the photojournalist Ernst Brunner (1901–1979), and a third collection consisting of vernacular photography which was owned by the Kreis Family (1860–1970). SGV_05 ASV consists of 292 maps and 1000 pages of commentary published from 1950 to 1995 — an example of such a map is shown in Figure 2.1. This collection was commissioned by the CAS to do an extensive survey of the Swiss population in the 1930s and 1940s on many issues pertaining, for instance, to everyday life, local laws, superstitions, celebrations or labour [@weiss_atlas_1940]. The contents were compiled by researchers and by people who were described as [6]. Questions were asked about everyday habits, community rights, work, trade, superstitions, and many other topics [@schmoll_richard_2009; @schmoll_vermessung_2009]. This collection offers a snapshot of everyday life in Switzerland right before the beginning of a modernisation process that fundamentally changed lifestyles in all areas during the postwar period. A digitised version of the ASV would not only allow the results of that time to be enriched with further findings [@schranz_critical_2021], but would also make transparent how knowledge was generated in cartographic form through a complex process along different types of media and actors. The restoration, digitisation, cataloguing and indexing efforts took all part throughout PIA under the supervision of Birgit Huber, who extensively based her doctoral research on this particular collection [see @huber_entdeckung_2023]. Figure 2.1: Map from the SGV_05 Collection Relating to Question 93 Showing Walks and Excursions at Pentecost. ASV. CAS. CC BY-NC 4.0 SGV_10 Kreis Family comprises approximately 20,000 loose photographic objects, where a quarter of them are organised and kept in 93 photo albums — as illustrated by Figure 2.2, from a wealthy Basel-based family and spanning from the 1850s to the 1980s. This private collection was acquired by CAS in 1991. The collection, which originally arrived in banana cases and was enigmatic due to the lack of clear organisation or accompanying information from the family, posed significant challenges. Despite these initial hurdles, CAS undertook meticulous efforts to catalogue and preserve its contents [@felsing_re-imagining_2024 p. 42]. The pictures were taken by studio photographers as well as by family members themselves. The Kreis Family collection represents a typical example of urban bourgeois culture and gives a comprehensive insight into the development of private photography over the course of a century [@pagenstecher_private_2009]. The photographic materials and formats are very diverse, ranging from prints to negatives, small, medium or large format photographs, black and white or colour. The collection also encompasses many photographic techniques, from the one-off daguerreotypes and ferrotypes, to the glass-based negatives that could be reproduced en masse, to the modern paper prints. While some of the albums and loose images were restored and digitised during the 2014 project, much of this work was completed during PIA and overseen by Murielle Cornut, whose doctoral investigation was centred on the study of photo albums [see @cornut_open_2023]. Figure 2.2: A photo Album Page from the SGV_10 Collection, Bearing the Following Inscription: Botanische Excursion ins Wallis, Pfingster 1928. SGV_10A_00031_015. Kreis Family. CAS. CC BY-NC 4.0 SGV_12 Ernst Brunner is a donation of about 48,000 negatives and 20,000 prints to the CAS archives from Ernst Brunner, a self-taught photojournalist, who lived from 1901 to 1979 and who documented mainly in the 1930s and 1940s a wide range of folkloristic themes — as shown by Figure 2.3. He is one of the most important photographers of the era and one of the most outstanding visual chroniclers of Swiss society [@pfrunder_ernst_1995]. His photographs show rural lifestyles, but also urban motifs. In his late work, he led the documentation and research on farmhouses in a specific Swiss district, a project initiated by CAS. Before Ernst Brunner became an independent photojournalist in the mid-1930s, he worked as a carpenter, influenced by the ideas of the Bauhaus and Neues Bauen movements. This can also be seen in the aesthetics and formal language of his photography. If all the black and white negatives were digitised and recorded between 2014 and 2018, the digitisation of prints, which is a selection done by Ernst Brunner, was conducted at the end of the PIA research project. The latter was supervised by Fabienne Lüthi, whose PhD was about organisational systems and knowledge practices in the Ernst Brunner Collection. Figure 2.3: Picture from the SGV_12 Collection Showing Walkers Looking at the Timetable Train. [Wanderer studieren den Fahrplan in der Bahnhofhalle]. Lucerne, 1938. Ernst Brunner. SGV_12N_00716. CAS. CC BY-NC 4.0 Whereas for each of the PhD Candidates in Cultural Anthropology, a particular collection was assigned to them and its content was to varying degrees part of their subject of study, this was not exactly the same for the PhD Candidates in DH, including myself, and in Computer Science. Put differently, we had relative leeway in terms of what interested us in each or all of these three photographic collections. In my case, I briefly explain my contribution to the project more in and then in as part of the empirical portion of my thesis focusing on the deployment of LOUD specifications using the three CAS photographic collections. Florian Spiess focused on the use of VR through vitrivr, a multimedia retrieval system developed by the DBIS research group at the Department of Mathematics and Computer Science [@spiess_multimodal_2022; @spiess_forschung_2023; @spiess_exploring_2024]. His work included experiments with PIA-related collections, such as the creation of virtual galleries clustered according to content-based similarity [see @peterhans_automatic_2022]. In the case of Max Frischknecht, his doctoral research centred on generative design[7], a methodology to visualise dynamic cultural archives. He mostly worked on the ASV collection and on a mapping tool which is a cartographic visualisation designed to explore the CAS photographic archives [see @frischknecht_generating_2022; @eggmann_digitalisierung_2024]. It should also be mentioned that not only did we use the three collections of the CAS photographic archives within the project, but that both formal and informal meetings took place most commonly within the photographic archives at the Spalenvorstadt premises in the old Gewerbemuseum and later either at the on Allschwilerstrasse, though less frequently, or at Rheinsprung where the Institute for Cultural Anthropology and European Ethnology is located. This meant that there was a strong and sometimes blurred entanglement between those involved in the archives and the PIA core team members. 2.2.3 Project Vision Between December 2021 and March 2022, we worked together to develop and finalise a vision for the project[8]. It includes seven key priorities, or pillars, which were meant to strengthen the interdisciplinary perspectives of PIA. Although ambitious, these elements were of paramount importance to us and served as a guiding blueprint for all PIA activities. Hereafter is a modified version of the vision[9] taken from @cornut_annotations_2023 [p. 4]. Accessibility by developing open interfaces and offering the possibility of expanding the archive and turning it into an instrument of current research that collects and evaluates knowledge with the participation of other users (Citizen Science). Heterogeneity by making visible where, why and under what circumstances the objects were created, how they were handled and what path they have taken to get to and in the archive. We work on visualisations that take into account the heterogeneous character of archival materials and make their respective biographies visible. Materiality by conveying the material properties of the objects: they have front and back sides, inscriptions, traces, development errors, they are transparent, multi-layered or fabric-covered. They tell of their origin, use, and peculiarities. We want to make this knowledge accessible and understandable in digital form. To this end, we also consider the necessary infrastructure involved in the creation as part of their narrative: the restoration, the relocation, the indexing, the storage devices, the research tools, the display medium, as well as the process of repro-photography. Interoperability as a crucial component and which will be done by supporting digital means that allow different stakeholders to freely access and interact with the project’s data. Both humans and machines can use, contribute to, correct and annotate the existing data in an open and interoperable manner, thus encouraging exchange and the creation of new knowledge. To do this, we use web-based standards that are widely adopted in the cultural heritage field. Affinities by leveraging data models and pattern recognition which can uncover semantic relationships between entities that were previously incomplete or difficult for users to access. Using specific interfaces and visualisations, we make it possible to explore digital assets and discover forms of relationships and similarities between images. AI that facilitates automated searches for simple image attributes such as colour, shapes, and localisation of image components. It should also become possible to recognise texts and object types for extracting metadata. Bias Management by taking into account that associated metadata was human-made[10] and thus is never objective. Collections and their metadata reflect biases or focus narrowly on selected areas and perceptions. Machines working on the basis of such data automatically reproduce the implicit biases in decision-making due to so-called biased algorithms. Therefore, understanding the data used for training and the algorithms applied for decision making is crucial to ensure the integrity of the application of these technologies in archives. We take ethical issues into account when using AI and visualisations, because the higher the awareness of a possible bias, the faster it can be detected or brought up for consideration with users. As my thesis is notably concerned with semantic interoperability, Interoperability and Affinities are of particular importance to my PhD thesis, although I recognise the importance of all pillars. Each of these resonated with me and my fellow PhD Candidates. As we immersed ourselves in the vision of the PIA research project, it became a unifying thread that brought us together in our research ambitions. We found that all these priorities within the project spoke to us at different points and provided a strong point of communication and practice in the development of processes, prototypes or interfaces. 2.3 Contribution to PIA and its Relevance to the Thesis To develop a participatory platform, an open and sustainable technological foundation for facilitating the reuse of CH resources was needed [@raemy_applying_2021]. Throughout the PIA project, I was mainly involved in the extension of the data infrastructure, the uptake of IIIF as well as designing the data model, leveraging Linked Art and WADM [@raemy_interlinking_2024]. As a member of Team B, I undertook this PhD as a bridge between the different teams, mostly participating in discussions with the three doctoral candidates from Team A to further develop and agree on the CAS data model and with the software developers from my team to discuss the impact of the data model on our evolving — yet transitory — infrastructure as well as helping in implementing the APIs adhering to the LOUD design principles. It was necessary to redesign the data model within the context of a database migration, from Salsah to the DSP, that happened between November 2021 and March 2024. This updated version, based on the Knora Base Ontology[11], corresponded to the needs of the CAS archives and to some extent to those of PIA, in particular to enable the PhD Candidates in Cultural Anthropology to make more precise assertions, whether in terms of descriptive metadata, or in the ability to link one object to another or to provide comments on these objects in several narrative forms. Moreover, an assessment of the appropriate technical standards for improved usability of the objects by both humans and machines was carried out, as a basis for extending the capabilities provided by DaSCH, such as helping the software developers to implement SIPI[12], a C++ image server compatible with the IIIF Image API and build services that create IIIF Presentation API 3.0 resources. While the theoretical framework of the thesis extends across the scope of PIA, the empirical part focuses on a specific set of findings derived from the research project outlined in , under the title . In this chapter, I discuss the data model and its refinement as well as the generation of custom IIIF Manifests during the specific digitisation, cataloguing and indexing efforts that took place throughout the project for the three CAS collections (SGV_05, SGV_10 and SGV_12) under investigation, the implementation of LOUD standards, and the overall design of the technological underpinnings. 2.4 Involvement within the IIIF and Linked Art communities I must acknowledge the invaluable role that my involvement within the IIIF and Linked Art communities has played in shaping my journey as a trained information specialist and an aspiring DH practitioner. Being an active participant in both communities has not only broadened my understanding of the latest developments in the field but has also profoundly influenced the trajectory of this dissertation. I have been involved within the IIIF community since October 2016 and the Working Groups Meeting that happened in The Hague[13]. This significant journey was, in fact, initiated by a recommendation from my first supervisor, Peter Fornaro, during my time as an undergraduate doing an internship at the DHLab. Little did I know that this recommendation would lead me to be carrying out a PhD and looking at IIIF not only as community-driven standards but as an object of study. Engaging with the IIIF community exposed me to cutting-edge advances in image interoperability and standards, and fostered a deeper appreciation for the importance of digital representations of cultural heritage. Through collaborative discussions with experts from diverse backgrounds, I gained new perspectives on the potential of technology to advance humanities research and preserve our collective cultural memory. Similarly, my involvement in the Linked Art community introduced me to the opportunities offered by LOUD and its transformative impact on research discourse. Exposure to Linked Data methodologies and the CIDOC-CRM has significantly influenced the way I have structured and interpreted the data in this dissertation, thereby enriching its scholarly breadth and rigour. I started to be actively involved in Linked Art at the beginning of my PhD in 2021, but I was already a by 2020, driven by the efforts of Rob Sanderson, my third supervisor. By mid-2023, I had become a member of the Editorial Board. The individuals I have met and the knowledge shared in these vibrant communities have deeply informed my approach as a scholar. The invaluable connections and collaborations I have made have expanded my network of fellow researchers, educators, and experts, leading to fruitful discussions that have significantly shaped the research questions addressed in this thesis. The events and workshops organised by these communities have also provided immersive learning experiences, giving me first-hand insights into the tools, technologies and methodologies used in the context of describing and disseminating CH data. The dynamic ecosystem of these communities has served as an inspiring backdrop, fostering innovative thinking and encouraging a more holistic approach to my research. 3. Interlinking Cultural Heritage Data Interlinking CH data is an important aspect of publishing heritage collections over the web, in particular by using LOD technologies to make assertions more easily readable and meaningful to machines [@marcondes_integrated_2021]. Due to the complexity of CH data and their intrinsic inter-relationships, it is necessary to define its nature and introduce controlled vocabularies and ontologies that can be integrated with existing web standards and interoperable with relevant platforms [@bruseker_cultural_2017; @hyvonen_using_2020]. Efforts to interlink CH data have brought about significant advancements, but challenges remain. One such challenge is finding a balance between completeness and precision of expression to ensure that the that CH data remain accessible and usable to a wider audience. Addressing this challenge, the Linked Open Usable Data (LOUD) design principles and the specifications that adhere to those, such as the IIIF Presentation API 3.0 and Linked Art, offer a promising approach [@raemy_enabling_2023]. By focusing on usability aspects from the perspective of software developers and data scientists involved in designing visualisation tools and data aggregation approaches, LOUD strives to enhance the overall user experience [@sanderson_keynote_2019]. Finding this equilibrium becomes crucial as CH data continues to grow in complexity and size, necessitating the seamless integration of native web technologies. The LOUD concept cultivates an environment that encourages the formation of vibrant CoP and the seamless integration of native web technologies, wherein an essential principle is the availability of comprehensive documentation supplemented with practical examples [@raemy_ameliorer_2022]. Moreover, the emphasis on leveraging widely adopted technologies enhances the interoperability of data and promotes its wider dissemination. With LOUD principles guiding the linking of CH data, the resulting web of knowledge becomes more than just a machine-readable resource; it transforms into a user-centric ecosystem where both accessibility of Linked Data and usability intersect to enable scholars and a wider audience to engage in the exploration and appreciation of CH [@newbury_loud_2018]. Finally, by fostering a collaborative, knowledge-sharing mindset, LOUD empowers software developers to implement data in a robust way, drawing insights from shared experiences [see @page_linked_2020]. In this chapter, which serves as the literature review of the PhD thesis, I attempt to draw on this brief introduction by dividing the insights into seven sections in order to provide an overview of the key concepts related to interlinking data in the CH domain. The literature review primarily encompasses works published up until December 2023, providing a comprehensive snapshot of the field’s current state and its evolution. Section 3.1 discusses what makes CH data stand out and Section 3.2 is about CH metadata standards, while Section 3.3 explores the technological trends, scientific movements and guiding principles that have shaped the field. Section 3.4 provides an overview of the web as an open platform, which are essential to understanding the current landscape of interlinking CH data. Section 3.5 focuses on LOUD, while Section 3.6 looks at characterising the community practices and semantic interoperability dimensions for CH. Finally, in Section 3.7, I summarise key elements from each section and within each of these I give some initial thoughts with respect to LOUD, and then conclude the chapter with some considerations on why we as a society need to care about CH data. 3.1 What Makes Cultural Heritage Data Stand Out? Here, I aim to establish the indirect territory of my study, as I am situated on a distinct plane that focuses on web technologies and standards — as well as software and services that enable them — as the subjects of investigation. However, it is crucial to acknowledge that LOUD specifications owe their existence to the available data that have served as case studies. Thus, their significance can be best understood through the lens of data and I recognise here the pivotal role played by CH practitioners — encompassing individuals from research and memory institutions — who have had a significant impact on specifying a series of web-based standards and who have helped to move forward the discovery of CH data and beyond, in particular those belonging to the public domain, in an open manner. In Subsection 3.1.1, I provide an introduction to CH as recognised by the UNESCO. I explore the tangible, intangible, and natural dimensions of CH, laying the foundation for further discussions on its representation and preservation, notably by giving a first definition of CH data. Next in 3.1.2, I look at the challenges of representation and embodiment of CH data. This subsection examines the challenges in describing and preserving its materiality or embodied aspects. Understanding the significance of collective efforts, communities, and the interplay of technologies. Thirdly, I discuss what I called ‘Collectives and Apparatuses’ in 3.1.3 where I highlight how actors in terms of collaborative actions and apparatuses play a pivotal role in CH. 3.1.1 Cultural Heritage The legacy of CH encompasses physical artefacts and intangible aspects inherited from past generations, reflecting the history and traditions of societies. Meanwhile, CH constantly evolves due to complex historical processes, necessitating preservation and protection efforts to prevent its loss over time [@loulanski_revising_2006]. The dynamic nature of CH demands collaborative actions, including documentation and the use of a range of technologies. The concept of CH is also characterised by perpetual evolution, mirroring the historical processes that shape societies over time. Social, political, economic, and technological shifts invariably influence the definition and perception of CH, prompting continuous reinterpretations and reevaluations of its significance. Over the years, the enthusiasm for the protection of cultural property has enriched the term with new shades of meaning. As societies undergo transformations, new layers of meaning and relevance are superimposed on existing CH, perpetually enriching its essence. As articulated by [@ferrazzi_notion_2021 p. 765]: ‘Cultural heritage’, as an abstract legacy or as a merge of tangible and intangible values, is able to encompass the totality of culture(s); in so, assuming a symbolic value that brings a clear break with all other terminologies. In conclusion, ‘cultural heritage’ as a legal term has demonstrated more than any others to be a real ensemble of historical stratification and cultural diversity. The advent of globalisation and rapid advancements in technology have further accelerated the evolution of CH. Increased interconnectedness and cross-cultural interactions have led to the fusion of traditions and the emergence of novel cultural expressions. Moreover, the digital era has facilitated the dissemination of CH resources on a global scale, transcending geographical barriers and preserving cultural knowledge for future generations as [@portales_digital_2018]. Thus, the intriguing nature of CH resources can be attributed to their multifaceted and diverse characteristics. The conservation and promotion of these resources demand a nuanced comprehension of the various types of heritage resources, culminating in effective preservation and promotion strategies that can account for their heterogeneity [@windhager_visualization_2019]. According to [@unesco_institute_for_statistics_unesco_2009], CH includes tangible and intangible heritage. Tangible CH refers to physical objects such as artworks, artefacts, monuments, and buildings, while intangible CH comprises practices, knowledge, folklore and traditions that hold cultural significance [@munjeri_tangible_2004]. The concept of heritage has evolved through a process of extension to include objects that were not traditionally considered part of the heritage. The criteria for selecting heritage have also changed, taking into account cultural value, identity, and the ability of the object to evoke memory. This shift has led to the recognition and protection of intangible CH, challenging a Eurocentric perspective and embracing cultural diversity as a valuable asset for humanity [@vecco_definition_2010]. Conservation guidelines have broadened the concept of heritage to include not only individual buildings and sites but also groups of buildings, historical areas, towns, environments, social factors, and intangible heritage [@ahmad_scope_2006]. In 2019, another instance of UNESCO defines CH in an even more comprehensive manner, taking into account natural heritage: Cultural heritage is, in its broadest sense, both a product and a process, which provides societies with a wealth of resources that are inherited from the past, created in the present and bestowed for the benefit of future generations. Most importantly, it includes not only tangible, but also natural and intangible heritage. [@unesco_culture_for_development_indicators_methodology_2014 p. 130] In thinking about the concept of CH, I find this last definition particularly resonant. This broader perspective is motivated by my interest with LOUD specifications as a research area, particularly because of their notable data agnosticism and as it resonated with @hyvonen_cultural_2012 [pp. 1-3]'s subdivision of CH as well. These services have the adaptability to process and use different types of data, transcending the boundaries of specific domains or disciplines. Although grounded in concrete CH cases, their potential to extend to any type of data, including those from STEM, is a compelling prospect that warrants further exploration, a point that I will explore later. The following sub-subsections aim to briefly discuss tangible, intangible, and natural heritage, as well as providing a definition of CH data which can serve as a foundational reference for this thesis. 3.1.1.1 Tangible Heritage Tangible CH encompasses physical artefacts and sites of immense cultural significance that are passed through generations in a society [@vecco_definition_2010]. These objects are tangible manifestations of human creativity, representing artistic creations, architectural achievements, archaeological remains as well as collections held by CHIs. One aspect of tangible CH is artistic creations such as paintings, sculptures and traditional handicrafts. These artefacts embody cultural values and artistic expressions and serve as essential reflections of a society’s collective ethos. For example, artworks such as ‘Irises’ from Vincent van Gogh[14] and Alberto Giacometti’s ‘L’Homme qui Marche I’ [15] are revered works of art that have deep cultural significance in Europe and all over the world. The built heritage, including monuments, temples and historic buildings, is another important component of the tangible CH. These architectural marvels not only represent past civilisations, but also convey the social values and aspirations of their time. The Taj Mahal, an exemplary white marble structure in India, stands as a poignant testament to Mughal architecture. Closer to where I write this dissertation one can mention the Abbey of St Gall, a convent from the century which is inscribed on the UNESCO World Heritage List. In the context of urban heritage, conventional definitions of built heritage often focus narrowly on the architectural and historical value of individual buildings and monuments, which are well protected by existing legislation. However, the challenge is to preserve urban fragments - areas within towns and cities that may not qualify as designated conservation areas, but are of significant cultural and morphological importance [@tweed_built_2007]. For instance, [@rautenberg_lemergence_1998] proposes two categories of built CH: heritage by designation and heritage by appropriation. Heritage by designation involves experts conferring heritage status on sites, buildings, and cultural objects through a top-down approach, often without public participation. This method can be predictable and uncontroversial, but can be criticised for being elitist and neglecting unconventional heritage. On the other hand, heritage by appropriation emphasises community and public involvement in identifying and preserving cultural expressions, leading to a more inclusive and dynamic understanding of heritage. Archaeological sites are also an integral part of the tangible CH, offering invaluable insights into past societies and ways of life. As per May 2024, UNESCO's long list of World Heritage Sites includes 1,199 cultural and natural sites in 168 different state parties — including 48 sites in transboundary regions[16]. Sites such as Machu Picchu, an impressive Inca citadel in the Peruvian Andes, bear witness to the architectural achievements and cultural practices of ancient civilisations. If archaeological sites are invaluable, they face significant threats such as looting, destruction, exploitation, and extreme weather phenomena [@bowman_transnational_2008; @micle_archaeological_2014]. To safeguard them, conservation efforts must be case-specific and include documentation and assessment of experiences gained [@aslan_protective_1997]. The preservation of tangible CH extends beyond physical objects to include libraries, archives and museums that house collections of books, manuscripts, historical documents and artefacts. Incidentally, the term “cultural property” is also employed as a related concept to tangible CH, encompassing both movable and immovable properties as opposed to less tangible cultural expressions [@ahmad_scope_2006]. Cultural property is protected by a number of international conventions and national laws. For instance, the Blue Shield[17] — an international organisation established in 1996 by four non-governmental organisations[18] — aims to protect and preserve heritage in times of armed conflict and natural disasters [@van_der_auwera_unesco_2013]. Its mission has been revised in 2016: The Blue Shield is committed to the protection of the world’s cultural property, and is concerned with the protection of cultural and natural heritage, tangible and intangible, in the event of armed conflict, natural- or human-made disaster. [@blue_shield_blue_2016 art. 2.1] Overall, tangible CH is a testament to human ingenuity and cultural diversity, and serves as a bridge between the past and the present. Its preservation is a collective responsibility, ensuring that the legacy of past generations endures and the wealth of cultural diversity continues to enrich the fabric of society. 3.1.1.2 Intangible Heritage The concept of intangible heritage emerged in the 1970s and was coined at the UNESCO Mexico Conference in 1982 [@leimgruber_switzerland_2010] with the aim of protecting cultural expressions that were previously excluded from preservation efforts [@hertz_politiques_2018]. UNESCO's previous focus had been on material objects, primarily from wealthier regions of the global North, leaving the intangible cultural heritage of the South overlooked. Attempts to protect intangible heritage through legal measures like copyright and patents were ineffective due to the collective nature of these cultural expressions and the anonymity of creators. The Convention acknowledges that intangible CH is essential for cultural diversity and sustainable development. Below is the definition given by the Convention for the Safeguarding of the Intangible Cultural Heritage: ‘The Intangible Cultural Heritage’ means the practices, representations, expressions, knowledge, skills – as well as the instruments, objects, artefacts and cultural spaces associated therewith – that communities, groups and, in some cases, individuals recognize as part of their cultural heritage. This intangible cultural heritage, transmitted from generation to generation, is constantly recreated by communities and groups in response to their environment, their interaction with nature and their history, and provides them with a sense of identity and continuity, thus promoting respect for cultural diversity and human creativity. [@unesco_basic_2022] According to UNESCO, intangible CH can be manifested in the following domains: oral traditions and expressions, including language as a vehicle of the intangible CH; performing arts; social practices, rituals and festive events; knowledge and practices concerning nature and the universe; traditional craftsmanship. Overall, intangible CH is a multifaceted concept that encompasses both traditional practices inherited from the past and contemporary expressions in which diverse cultural groups actively participate [@munjeri_tangible_2004; @leimgruber_was_2008]. It includes inclusive elements shared by different communities, whether they are neighbouring villages, distant cities around the world, or practices adapted by migrant populations in new regions. These expressions have been passed down from generation to generation, evolving in response to their environment, and play a crucial role in shaping our collective identity and continuity. Intangible CH promotes social cohesion, strengthens a sense of belonging and responsibility, and enables individuals to connect with different communities and society at large. Central to the nature of intangible CH is its representation within communities. Its value goes beyond mere exclusivity or exceptional importance; rather, it thrives on its association with the people who preserve and transmit their knowledge of traditions, skills and customs to others within the community and across generations. The recognition and preservation of intangible CH depends on the communities, groups or individuals directly involved in its creation, maintenance and transmission. Without their recognition, no external entity can decide on their behalf whether a particular practice or expression constitutes their heritage. The community-based approach ensures that intangible CH remains authentic and deeply rooted in the living fabric of society, protected by those who care for and perpetuate it. In Switzerland, the Winegrower’s Festival in Vevey (La Fête des Vignerons), a plurisecular event celebrating the world of wine making [@vinckMetiersOmbreFete2019] and the Carnival of Basel (Basler Fasnacht) [@chiquet_how_2023] are examples of traditions that are listed among UNESCO's intangible CH. (In)tangibility is not always a straightforward concept and can indeed be blurred, i.e. it goes beyond the mere idea of materialisation. Many artefacts and elements of CH possess both tangible and intangible qualities that intertwine and complement each other, making the distinction less clear-cut. For instance, this Male Face Mask, held at the Art Institute Chicago[19], also known as ‘Zamble’, from the Guro people in the Ivory Coast holds dual significance as both a tangible and intangible CH. As a tangible object, the mask is a physical artefact made from wood and pigment, fabric, and various adornments, that combines animal and human features representing the Guro people’s artistic skills. On the other hand, as an intangible cultural object, the Zamble mask carries profound spiritual and cultural meaning. It plays a significant role in commemorating the deceased during a man’s second funeral. These second funerals are organised months or even years after the actual burial as a way to honour and remember the departed [see @haxaire_power_2009]. Thus, the preservation and appreciation of both the tangible and intangible aspects of the mask are essential to its cultural relevance. Another example of the blurred line between tangible and intangible heritage is emphasised by @de_muynke_ears_2022 in recreating reported perceptions of the acoustics of Notre-Dame de Paris through a collaboration between sciences of acoustics and anthropology. The authors highlight the heritage value of how people subjectively perceive sound in a space, particularly in places of worship where sound and music are integral to the religious experience. The authors advocate integrating the study of both material and non-material aspects to understand the changing sonic environments of heritage buildings [@de_muynke_ears_2022 pp. 1-2]. @katz_digitally_2023 articulates that ‘acoustics is an intangible product of a tangible building’. This integrated perspective could lead to a more holistic understanding of the dynamics between physical spaces and the perceptual and experiential dimensions attached to them. 3.1.1.3 Natural Heritage Natural heritage, encompassing geological formations, biodiversity, and ecosystems of cultural, scientific, and aesthetic value, shares a significant overlap with CH. Many natural sites hold spiritual and symbolic importance for communities, becoming repositories of cultural memory and identity [@lowenthal_natural_2005]. Traditional ecological knowledge developed by various cultures also underscores the interconnectedness of cultural and natural heritage, as indigenous communities have accumulated wisdom on sustainable resource use and ecological balance [@azzopardi_what_2023]. Moreover, the conservation and sustainable management of natural heritage is often intertwined with efforts to protect CH, fostering a collective commitment to preserve these entangled legacies for future generations. The link between natural and CH goes beyond their shared values; spatial overlaps further accentuate their interdependence. Natural sites may have cultural significance, while CH sites may be situated within natural landscapes. For example, a national park may include archaeological sites or culturally revered landscapes, thus intertwining the cultural and natural dimensions. This spatial intermingling highlights the inextricable relationship between human societies and the natural environment, as cultural practices and beliefs become intertwined with the landscapes they inhabit. In this way, the preservation of both natural and cultural heritage becomes essential not only for their intrinsic worth but also for sustaining the narrative of our shared human and environmental history. Additionally, the distinction between nature and culture is not only subjective and dependent on human appreciation [@vandenhende_management_2017]. Rather, it is a concept intrinsically linked with the overarching framework of modernism, a perspective that has been critically examined and deconstructed by the influential sociologist and philosopher, Bruno Latour, that have argued that ‘we have never been modern’ [@latour_we_1993]. Latour’s deconstruction of the modernist perspective extends to the recognition that the ‘the proliferation of hybrids has saturated the constitutional framework of the moderns’ [@latour_we_1993 p. 51]. This assertion underscores the fundamental challenge posed by hybrid entities – those that blur the boundaries between nature and culture – to the traditional categories upon which modernist thinking has been predicated. In essence, the concept of hybrids disrupts the neat divisions between the natural and social worlds that have been a hallmark of modernist discourse and provide us an opportunity to situate ourselves as ‘amodern’ as opposed to postmodern [@latour_postmodern_1990]. In addition to Latour’s critique of the modernistic distinction between nature and culture, the concept of the ‘parasite’, as expounded by Michel Serres, one of the influential thinkers who significantly influenced Latour’s intellectual development [@berressem_deja_2015]. It offers a valuable lens through which to examine the intricacies of interconnectedness and interdependence within our world. In his view, everything is enmeshed in a complex web of relationships that negates the existence of self-contained entities. Rather than seeing discrete and isolated entities, Serres invites us to see everything as an integral part of a larger system in which each component is inextricably dependent on the others [@serres_parasite_2014]. Together, these complementary perspectives invite us to reevaluate our understanding of the intricate tapestry of existence, emphasising the complexities of our relationship with the world. Thus, the appreciation of nature and culture is not mutually exclusive, but rather forms a continuous and evolving relationship. The modern perspective has historically separated these realms, treating them as distinct and disconnected. However, a more inclusive approach dissolves this artificial boundary and recognises the interconnectedness of nature and culture [@haraway_encounters_2008; @haraway_staying_2016]. This paradigm shift challenges the traditional modern understanding and invites a more holistic view in which natural and cultural heritage are mutually constructed within a complex network of relationships. Recognition of this relationship is essential in the context of heritage conservation and understanding. The dynamic interplay between nature and culture is recognised, and the acknowledgement of their coexistence promotes a more holistic approach to heritage conservation, where cultural practices, traditions and ecological systems are seen as interdependent aspects of the wider heritage tapestry. This recognition encourages us to see heritage sites not as isolated entities, but as part of a larger web of interconnectedness, and urges us to conserve and value both cultural and natural heritage with a shared responsibility. Adopting this interconnected perspective enables us to appreciate the profound connections between human societies and the natural world, and inspires a collective commitment to safeguarding these precious legacies for future generations. 3.1.1.4 Cultural Heritage Data As I embark on the exploration of CH data, it is first necessary to establish a basic understanding of data in this context. At its core, data represents more than mere numbers and facts; it constitutes a collection of discrete or continuous values that are assembled for reference or in-depth analysis. In essence, data are the rich tapestry upon which the narratives of CH are woven, making its comprehension a critical prerequisite for our expedition into this domain. Luciano Floridi — a prominent philosopher in the field of information and digital ethics — provides a thorough perspective on the term ‘data’ and offers valuable insights into its fundamental nature in its PI. He perceives ‘data at its most basic level as the absence of uniformity, whether in the real world or in some symbolic system. Only once such data have some recognisable structure and are given some meaning can they be considered information’ [@floridi_information_2010]. This initial definition sets the stage for a deeper exploration of Floridi’s understanding of data, as he further focuses on its transformative journey into a more meaningful and structured form, which we will explore next. Building upon Floridi’s foundational concept of data as the absence of uniformity, his subsequent definition provides a more comprehensive perspective. In a previous work, @floridi_is_2005 [p. 357] argues that ‘data are definable as constraining affordances, exploitable by a system as input of adequate queries that correctly semanticise them to produce information as output’. This definition highlights the dynamic role of data, not only as raw entities awaiting structure and meaning but also as elements imbued with the potential to constrain and guide systems towards the generation of meaningful information. Transitioning from Floridi’s concept of data, we progress to the view that data can be notably seen as interpretable texts within the DH perspective. According to (Owens, 2011) @owens_defining_2011: there are four main perspectives on how Humanists can engage with data: Data as constructed artefacts: data are a product of human creation, not something inherently raw or neutral; Data as interpretable texts: Humanists can interpret data as authored works, considering the intentions of the creators and how different audiences understand and use the data; Data as processable information: data can be processed by computers, allowing various forms of visualisation, manipulation and analysis, which can lead to further perspectives and insights; Data can hold evidentiary value: data, as a form of human artefact and cultural object, can provide evidence to support claims and arguments. These considerations highlight the multifaceted nature of data within the field of DH. It is in this complex landscape that we recognise that data transcends its traditional role as a passive entity. As @rodighiero_mapping_2021 [p. 26, citing [@akrich_sociologie_2006]] suggests that ‘there is no doubt that data are full-fledged actors that take part in the social network the actor-network theory describes, in which both human and non-human intertwine and overlap’. This notion – rooted and borrowed from STS – reinforces the idea that data, as an active and dynamic entity, plays a significant role in shaping the interactions between human and non-human actors in any digital spheres. From these angles, I can look at the characteristics of CH data. @bruseker_cultural_2017 [p. 94] articulate that ‘data coming from the cultural heritage community comes in many shapes and sizes. Born from different disciplines, techniques, traditions, positions, and technologies, the data generated by the many different specializations that fall under this rubric come in an impressive array of forms’. In exploring CH data, it is important to recognise the inherent diversity stemming from diverse disciplines, techniques, and traditions. @bruseker_cultural_2017 [p. 94] aptly emphasise this, highlighting the extensive array of forms in which data manifests. This heterogeneity raises fundamental questions about the unity and identity of CH data — a crucial aspect deserving acknowledgement within this context. As the authors astutely ponder: It could be a natural problem to pose from the beginning: if the data of this community indeed presents itself in such a state of heterogeneity, does it not beg the question if there is truly an identity and unity to cultural heritage data in the first place? It could be argued that Cultural Heritage, as a term, offers a fairly useful means to describe the fuzzy and approximate togetherness of a wide array of disciplines and traditions that concern themselves with the human past. Expanding on these insights, CH data refer to digital or data-driven affordances of CH[20], embodying a rich and varied compilation of insights originating from a variety of disciplines, techniques, traditions, positions and technologies. It encompasses both tangible and intangible aspects of a society’s culture as well as natural heritage. These data, derived from a wide range of disciplines, offer a latent capacity to support the generation of knowledge relating to historical time periods, geospatial areas, as well as current and past human and non-human activities. They are collected, curated and maintained by various entities such as libraries, archives, museums, higher education institutions, non-governmental organisations, indigenous communities and local groups as well as by the wider public. Building further on the mosaic of CH data, three primary dimensions come to the fore: heterogeneity, knowledge latency, and custodianship. Heterogeneity: As a fundamental characteristic, signifies the diverse forms and origins that shape this invaluable reservoir of human heritage. Different techniques and varying viewpoints in treating modelling also contribute to this heterogeneity [@guillem_faire_2023]. Knowledge latency: It highlights the temporal dimension, presenting CH data as a repository of latent knowledge awaiting discovery and interpretation. Notably, not all artefacts are – or should be – digitised, and even among those that are, (mis)representation and challenges in interconnecting them persist [@rossenova_iterative_2022]. Besides, the issue of structured data – or the lack of it – reinforces the aspect of knowledge latency [@haciguzeller_emerging_2021]. Custodianship: This dimension reinforces the essential role played by a variety of entities, predominantly CHIs, in safeguarding and managing resources, ensuring their preservation and accessibility for present and future generations. However, it is very important to acknowledge the great divide in terms of resources, with indigenous and local communities often facing challenges in custodianship responsibilities. Taken together, these dimensions contribute to a comprehensive understanding of the nuanced fabric of CH data. They reveal the diversity of forms and origins, the temporal aspects and the responsible stewardship that are crucial to the sustainability of such data. By shifting our focus to the sphere of humanities data, we broaden our scope to extend beyond the peculiarities of CH data. Drawing parallels between these areas allows us to grasp the interconnectedness of our heritage. CH data usually refers to information about cultural artefacts, sites, and practices that hold historical or cultural significance. Humanities data encompasses information about human culture, history, and society, including literature, philosophy, art, and language [@tasovac_cultural_2020]. Both often involve ethical considerations, such as ownership, access, and preservation, and require a comprehensive understanding of their various meanings and values [@ioannides_towards_2019]. Moreover, @schoch_big_2013 explains that data in the humanities, such as text and visual elements, have unique qualities. While these analogue forms could be considered data, they lack the ability to be analysed computationally as they are non-discrete. The semiotic nature of language, text and art introduces dimensions tied to meaning and context, making the term ‘data’ problematic. Critics question its use because it conflicts with humanistic principles such as contextual interpretation and the subjective position of the scholar. @schoch_big_2013 distinguish data in the humanities further into two core types: smart and big data. The former tends to be small in volume, carefully curated, but harder to scale such as digital editions. As for the latter, it describes voluminous and varied data and it loosely relies on the three ⋁ by @laney_3d_2001: volume, velocity and variety (see 3.3.1.2). Yet, big data in the humanities differs significantly from other fields as it rarely requires rapid real-time analysis, is less focused on handling massive volumes, and instead deals with diverse, unstructured data sources. @schoch_big_2013 concludes by arguing that ‘I believe the most interesting challenge for the next years when it comes to dealing with data in the humanities will be to actually transgress this opposition of smart and big data. What we need is bigger smart data or smarter big data, and to create and use it, we need to make use of new methods’. Data processing offers great potential for humanities research as @owens_defining_2011 argues: ‘In the end, the kinds of questions humanists ask about texts and artifacts are just as relevant to ask of data. While the new and exciting prospects of processing data offer humanists a range of exciting possibilities for research, humanistic approaches to the textual and artifactual qualities of data also have a considerable amount to offer to the interpretation of data’. While the term ‘data’ in the context of the humanities may raise questions due to its semiotic and contextual complexities, it serves as a foundation for understanding both CH data and broader humanities data. The data originating from CH and the humanities are inherently intertwined, as they often share a similar nature and purpose for scholars. This strong interconnection leads to a collaborative relationship between the GLAM sector and the humanities or DH. Scholars in the humanities frequently rely on digitised cultural artefacts, historical records, linguistic resources, and literary works provided by GLAM institutions to gain valuable insights into human history, culture, and traditions. The digitisation efforts and research collaborations between these entities play a pivotal role in preserving CH data and advancing our understanding of diverse societies, fostering a deeper appreciation of our shared human heritage. CH data and humanities data are distinct from other scientific data due to their qualitative and subjective nature, which requires different methods of analysis than quantitative scientific data. They include archival and special collections, rare books, manuscripts, photographs, recordings, artefacts, and other primary sources that reflect the cultural beliefs, identity, and memory of a people [see @sabharwal_2_2015; @izu_sociocultural_2022]. In summary, while CH data and humanities data share some commonalities, they differ in terms of scope and subject matter. CH data focuses specifically on the preservation and documentation of physical artefacts and intangible attributes, while humanities data encompasses a broader range of disciplines within the humanities [@munster_digital_2019]. However, it is important to note that the distinction between CH data and humanities data can be blurred, as (meta)data should ideally be co-created and integrated across both domains. 3.1.2 Representation and Embodiment of Cultural Heritage Data Digital representation of CH data, while preserving their context and complexity, remain a significant challenge. Those representations, sometimes referred to as digital surrogates or digital twins [@conway_digital_2015; @shao_digital_2018; @semeraro_digital_2021], of CH data can potentially lead to a loss of context and a reduction in the richness of the CH represented. For instance, a digital image of a cultural artefact may not capture its materiality, such as its texture, weight, and feel, which are essential aspects of the artefact’s cultural significance [@force_context_2021]. Furthermore, digital representations may also exclude vital social, cultural, and historical contexts surrounding the object, which is crucial to understanding its full cultural value [@cameron_beyond_2007]. This subsection is structured around two key dimensions. Firstly, it explores materiality, highlighting how digital representations may fail to capture important aspects that are integral to understanding the significance of CH resources. Secondly, it navigates the convergence and divergence between digitised CH and digital heritage. 3.1.2.1 Materiality Briefly, materiality refers to the physical qualities of an object or artefact, such as its colour, texture, and composition. As part of built heritage, the emphasis for materiality relates primarily to architecture, its associated techniques and the range of materials used in the construction or renovation of a building. More specifically, materiality acts as a pivotal factor in the transformation of disparate fragments of material culture into heritage, providing a vital link to the intangible facets of heritage. It contributes significantly to an individual’s social position and ability to navigate specific social milieus, thereby determining their ability to transmit cultural knowledge and values to future generations. The transformative potential of materiality in this regard underscores its fundamental role in perpetuating heritage and the transmission of cultural legacies [@carman_where_2009]. The physical attributes of objects, including texture, colour and shape, can evoke different emotions and associations, shaping people’s perceptions and memories of these events. Beyond retrospective influences, the potential of materiality extends to the creation of new memories and meanings, as exemplified by the use of materials such as glass in contemporary art. In such cases, materials evoke not only their inherent properties but also symbolic connotations, adding new layers of meaning and memory to the artistic narrative [@fiorentino_persistence_2023]. @edwards_photographs_2004 [p. 3] argue that materiality is not just concerned with physical objects in a positivist sense, but also involves complex and fluid relationships between people, images, and things. This relationship is influenced by social, cultural, and historical contexts, and plays a crucial role in shaping our perceptions and experiences of the world. Moreover, materiality is central to giving meaning to non-human entities [see @latour_actor-network_1996; @haraway_companion_2003; @star_institutional_1989], which emphasises the role of both humans and non-humans in shaping social and cultural phenomena. For CH data, diversity is at its core, as it allows for the exploration of different ways of knowing, experiencing, and expressing the world. Therefore, it is important to approach materiality not as a static and fixed concept, but as a dynamic and evolving phenomenon that is shaped by multiple forces [@hahn_digitale_2018 pp. 62-63]. When discussing materiality, there is also its negation, i.e. the notion of space or emptiness, such as how people interact with it through built heritage, which is regarded as a primordial medium of material culture, as expounded by @guillem_rcc8_2023 [p. 2]: The most intuitive and foundational definition of architecture is the built thing, that is the architecture qua building or built work. Human beings continuously interact with the built materiality through the non-materiality of space. Space as emptiness is formed and defined by the materiality that affects its existence. That relation between fullness and emptiness is what makes possible architecture as lived and experienced space. Materiality also offers a means of challenging dominant narratives and power structures, particularly the Western-centric perspective on CH. It gives greater recognition to the importance of intangible CH, which often takes a back seat to tangible objects in dominant narratives [@lenzerini_intangible_2011]. By highlighting the materiality of marginalised or forgotten elements, individuals can reclaim their heritage and challenge dominant narratives that marginalise certain groups, contributing to a more inclusive and accurate representation of CH. The primary focus in terms of digitisation is also on preserving material-based knowledge, often overlooking the dynamic and living nature of intangibility. @hou_digitizing_2022 stress the crucial role of computational heritage and information technologies advances in preserving and improving access to intangible CH. Effectively documenting the ephemeral aspects of intangible heritage and communicating the knowledge that is deeply linked to individuals are pressing challenges. Recent initiatives seek to capture the dynamic facets of cultural practices, using visualisation, augmentation, participation and immersive experiences to enhance experiential narratives. There is a strong call for a strategic re-evaluation of the intangible CH digitisation process, emphasising the human body as a vessel for traditions and memories, such as capturing traditional Southern Chinese martial arts, who has been passed down colloquially from generations and needs a methodological approach to capture such embodied knowledge [see @adamou_facets_2023; @hou_ontology-based_2024]. Even in cases where considerable efforts have been devoted to digitisation of physical objects such as medieval manuscripts and rare books over the past few decades [@nielsen_digitisation_2008], a lingering concern persists regarding the authentic encounter with the original artefact, despite its enhanced accessibility through digital surrogates [@van_lit_digital_2020]. Material attributes present a persistent challenge to achieving full replication. Despite advances facilitated by techniques such as RTI, 3D digitisation, or VR and AR, which offer better experiential immersion and are more effective than two-dimensional representations in addressing certain materiality concerns, the ability to replicate the multifaceted sensory experience associated with the original object, including the palpable emotions and spatial sensation, remains an ongoing endeavour, presenting a complex and multifaceted dimension of a challenge that is quite unlikely and may never be fully feasible [see @endres_digitizing_2019]. 3.1.2.2 Digitised Cultural Heritage and Digital Heritage The concepts of digitised CH and digital heritage intersect through the use of digital technology for the preservation, access, and dissemination of CH resources. Digitised CH focuses on converting physical artefacts into digital forms, ensuring their long-term preservation and accessibility through digital means. Conversely, digital heritage includes a broader range of digital tools and resources ‘to preserve, research and communicate cultural heritage’ (@munster_digital_2021 p. 2, citing [@georgopoulos_cipas_2018]). Digitised CH acts as a critical bridge, facilitating a transition from traditional or analogue GLAM practices to a digital environment. This shift is pivotal in unlocking the potential of digitised CH. These values extend beyond scholarly pursuits, despite the majority of digitisation efforts being driven by research funding. In doing so, it becomes evident that the creative reuse and data-driven innovation stemming from digitised CH necessitate substantial and sustained investment in the GLAM sector. This investment is fundamental, especially amidst reduced funding due to years of austerity. @terras_value_2021 underscore this need, shedding light on the delicate balance required with commercial outcomes. They emphasised that leveraging CH datasets offers vast opportunities for technological innovation and economic benefits, urging professionals from various domains to collaborate and experiment in a low-risk environment. Digital heritage[21] encompasses a wide range of human knowledge and expression in cultural, educational, scientific and various other domains. In today’s rapidly evolving technological landscape, an increasing amount of this knowledge is either digitally created or in the process of being converted from analogue to digital formats [@he_digital_2017]. These digital resources cover a wide range, including text, multimedia, software and more, and require deliberate and strategic management to ensure their long-term preservation. This valuable heritage, spread across the globe and expressed in multiple languages [@unesco_charter_2009]. In summary, digitised CH not only forges the path to digital heritage but also embodies an ever-evolving cultural landscape. Recognising the transformative potency with digital heritage is essential to enriching our understanding and engagement with our cultural roots. Both concepts are intimately embedded in CH and play a vital role as conduits. 3.1.3 Collectives and Apparatuses The collaborative efforts of collectives and the operation of various apparatuses play a fundamental part in shaping the preservation, interpretation and dissemination of cultural artefacts and practices. This subsection is concerned with the central contributions of human and non-human actors engaged in cooperative action and the modus operandi of various apparatuses, such as building (digital) infrastructures. Some of these considerations are drawn from STS, which are more fully captured in , serving as the theoretical framework for the thesis. Bruno Latour’s concept of the importance of collectives and apparatuses [see @latour_habiter_2022 p. 15] can be extrapolated to CHIs. Every institution’s or project’s ultimate success hinges on the collaboration and support of individuals, as well as the tools, systems and technologies they use. Indeed, paralleling CHIs with wider contexts suggests that collective efforts and apparatuses play a critical role in shaping the effectiveness of any institution. This highlights the importance of recognising the influence of both human and non-human entities in institutional functioning and underlines the need for a more comprehensive understanding of the dynamics involved therein. ANT can be a useful lens to analyse the creation, use, and dissemination of CH data. ANT posits that actors are not independent entities but are instead part of a network that consists of both human and non-human entities. According to ANT, every actor, be it a person or a technology, is a node in the network and contributes to the overall functioning of the network [@latour_reassembling_2005; @callon_actor_2001]. When we apply this framework to CHIs, we can identify the different actors involved in the creation, use, and dissemination of CH data. These actors can include individuals, such as curators, conservators, and historians, as well as non-human entities, such as databases, digitisation equipment, and software. Moreover, this approach can help us understand the interactions between these actors and how they shape the overall functioning of CHIs. For instance, digitisation equipment can enable the creation of high-quality digital images of artefacts, which can then be disseminated globally through online platforms. Examining the Notre-Dame de Paris, one can discern the keystones at the summit of its arches as indispensable actors within its architectural narrative. These keystones, imbued with historical narratives and a non-human facet, played a central role in the (digital) rescue and subsequent restoration efforts following the tragic roof fire in April 2019. @guillem_faire_2023’s study further elucidates this restoration journey, emphasising how the keystones, with their individual narratives and structural significance, contributed to the (digital) reassembly. Building on this perspective, we can explore the importance of community involvement in the preservation and management of CH data, thereby increasing the potential for sustainable practices and inclusive engagement. Local communities have an integral part to play in the management and preservation CH data, especially in the digital age where resources are often scarce for GLAM institutions. Community involvement has several benefits, including increased engagement and participation, access to local knowledge and expertise, and more sustainable and inclusive management and preservation practices [@ridge_12_2021]. For instance, geophysical technologies such as ground-penetrating radar have been used with great success in identifying and evaluating the depth, extent, and composition of CH resources for research and management purposes, easing tensions when working with sensitive ancestral places [@nelson_role_2021]. Collaborative environments can also help with CH information sharing and communication tasks because of the way in which they provide a visual context to users, making it easier to find and relate CH content [@respaldiza_hidalgo_metadata_2011]. Embarking on @brown_communities_2023 [pp. 6-7]'s insightful analysis, a prominent illustration of exemplary community practice can be found in the sphere of community museums in Latin America: Inicio - Museos Comunitarios de América[22]. The author highlights the role of community engagement and leadership in the creation and operation of these museums. Such engagement ensures that these museums are not imposed from outside, but rather emerge organically as museums the community, resonating with its unique CH and identity. This approach is consistent with the ethos of ‘telling a story’, building a future, which embodies a deep commitment to community empowerment and cultural preservation. This community-centric approach amplifies the museum’s resonance with the community’s lived experiences and historical narratives. At the same time, institutions can also benefit from collaborating with peer communities like IIIF to promote greater access to their collections. IIIF provides a set of open standards for delivering high-quality digital objects online at scale, which can help memory and academic institutions share their collections with each other and with the wider public [@snydman_international_2015; @weinthal_iiif_2019]. By adopting IIIF standards, organisations can make their collections more discoverable and accessible to researchers, developers, and other CH professionals [@padfield_joseph_practical_2022]. Involvement in communities such as IIIF also helps to mitigate costs as they develop shared or adaptable resources and services [@raemy_international_2017]. Participation of communities in the management and preservation of CH resources is essential to ensure that CH is protected and accessible for future generations. By involving and participating in communities, GLAMs can tap into local as well as peer knowledge and expertise, making management and preservation practices more sustainable and inclusive. This approach also increases engagement and participation, ensuring that CH is valued and appreciated by the wider community. Thus, memory institutions need to collaborate closely with communities to ensure that CH data, and their underlying infrastructures and services, is being effectively curated [@delmas-glass_fostering_2020]. Closely related to this context, @star_ethnography_1999 points out the often unacknowledged role of infrastructure within society. She argues that infrastructures are necessary but often invisible and taken for granted: People commonly envision infrastructure as a system of substrates – railroad, lines, pipes and plumbing, electrical power plants, and wires. It is by definition invisible, part of the background for other kinds of work. It is ready-to-hand. This image holds up well enough for many purposes – turn on the faucet for a drink of water and you use a vast infrastructure of plumbing and water regulation without usually thinking much about it. [@star_ethnography_1999 p. 380] @star_ethnography_1999 [pp. 381-382, citing [@star_steps_1994]] identifies nine dimensions to define infrastructure. They provide a comprehensive framework to comprehend the nuanced nature of infrastructure and its pervasive impact on diverse societal facets. The following dimensions are vital for analysing the often imperceptible, yet deeply embedded structures that constitute the foundational framework of both daily life and broader societal operations[23]: Embeddedness: Infrastructure is sunk into and inside of other structures, social arrangements, and technologies. People do not necessarily distinguish the several coordinated aspects of infrastructure. Transparency: Infrastructure is transparent to use, in the sense that it does not have to be reinvented each time or assembled for each task, but invisibly supports those tasks. Reach or scope: This may be either spatial or temporal – infrastructure has reach beyond a single event or one-site practice. Learned as part of membership: Strangers and outsiders encounter infrastructure as a target object to be learned about. New participants acquire a naturalised familiarity with its objects, as they become members. Links with conventions of practice: Infrastructure both shapes and is shaped by the conventions of a community of practice. Embodiment of standards: Modified by scope and often by conflicting conventions, infrastructure takes on transparency by plugging into other infrastructures and tools in a standardised fashion. Built on an installed base: Infrastructure does not grow de novo; it wrestles with the inertia of the installed based and inherits strengths and limitations from that base. Becomes visible upon breakdown: The normally invisible quality of working infrastructure becomes visible when it breaks: the server is down, the bridge washes out, there is a power blackout. Is fixed in modular increments, not all at once or globally: Because infrastructure is big, layered, and complex, and because it means different things locally, it is never changed from above. Changes take time and negotiations, and adjustment with other aspects of the systems are involved. An appreciation of these dimensions is crucial to the analysis of the network of infrastructural systems that underpin contemporary society, and is necessary for the analysis of any digital infrastructure that manages CH data. Digital infrastructures – also known as e-infrastructures or cyberinfrastructures – are forms of infrastructure that are essential for the functioning of today’s society [see @jackson_understanding_2007; @ribes_sociotechnical_2010]. These kinds of infrastructure need to be understood as socio-technical systems, showcasing the interplay between technological components (such as hardware, software, and networks) and the social and organisational contexts in which they operate [@star_steps_1994]. According to @fresa_data_2013 [p. 33], digital CH infrastructures should be able to serve the research needs of humanities scholars as well as having dedicated services for education, learning, and general public access. In terms of requirements, @fresa_data_2013 [pp. 36-39] identifies three different layers of services: for content providers, for managing and adding value to the content, and for the research communities. For the latter, several sub-services tailored to research communities are listed. These encompass long-term preservation, PIDs[24], interoperability and aggregation, advanced search, data resource set-up, user authentication and access control, as well as rights management. Overall, (digital) infrastructures are imperative apparatuses in preserving and sharing CH data. First, they support preservation by archiving digital artefacts and their metadata, protecting them from deterioration and loss. Secondly, these infrastructures facilitate accessibility, allowing a global audience to explore and appreciate cultural heritage online. Finally, they encourage interpretation and engagement, promoting cross-cultural understanding and knowledge sharing. Moreover, infrastructure is a fundamental component that demands extensive investment, particularly in the creation of streamlined integration layers capable of interacting seamlessly with different systems. This can be exemplified by such institutions as the Rijksmuseum[25] , where a well-constructed infrastructure allows for efficient integration and interaction with various technological and organisational systems [@dijkshoorn_building_2023]. This investment serves as the foundation for an institution’s functionality, allowing for the smooth flow of data, the coordination of processes and the optimal use of resources. In a similar vein, @canning_power_2022 argue that the often invisible structures of metadata, particularly in Linked Data ontologies, play a crucial role in shaping the interpretation of data. These structures, while not immediately apparent, are imbued with value judgements and ideological implications, extending the impact of metadata beyond mere technicalities to encompass diverse and inter-sectional perspectives. This multidimensional ontological approach addresses the complexity and diversity of data sources, paralleling the need for sophisticated infrastructures in institutions like the Rijksmuseum. It underscores the importance of integrating inter-sectional feminist principles in information systems, reflecting a commitment to diverse ways of knowing and nuanced storytelling. Furthermore, as all (meta)data requires storage, it raises an important concern in terms of the entrenched power dynamics governing knowledge representation within information systems, as pointed out by @canning_what_2023. This perspective, initially centred around museum objects, holds broader implications for all CH resources [see @simandiraki-grimshaw_what_2023]. Canning strongly advocates for the essential adaptation of databases to embrace a diverse array of epistemological approaches by introducing new types of affordances. Databases, despite their role in information preservation, wield significant influence that can inadvertently stifle diverse modes of knowledge interpretation and ‘can constrain ways of knowing’. Furthermore, she compellingly argues that modifications to databases extend beyond technical adjustments; they are inextricably linked to shifts in institutional power dynamics and the enduring, often inequitable, power dynamics governing the world of museums – or any CHIs – and their curation. In understanding the interplay of collectives and apparatuses, it is clear that key actors, including individuals, institutions, local and global communities, as well as the sophisticated fabric of (digital) infrastructures and their components, are deeply entangled and interconnected. These entities, both human and non-human, collectively shape and navigate the rich networks of human interactions and technologies that underpin the foundations of contemporary society. 3.2 Cultural Heritage Metadata This subsection offers insights into the importance of metadata in CH, underlining its role in enhancing the understanding and accessibility of cultural artefacts. It is structured into three four[26] essential parts. I start with an introductory segment in 3.2.1, then I explore the types and functions of metadata in 3.2.2, thirdly in 3.2.3, I outline some of the most important CH metadata standards, and finally in 3.2.4, I explore the use of KOS, such as generic classification systems and controlled vocabularies. 3.2.1 Data about Data For curating CH resources, metadata[27], ‘data about data’, is probably one of the key concept that needs to be introduced here. Metadata permeate our digital and physical landscapes, playing a vital role in organising, describing and managing a vast array of information. Rather than being confined to a specific domain, they are ubiquitous and pervade many aspects of our everyday lives [@riley_understanding_2017 pp. 2-3]. From websites and databases to social media platforms and online marketplaces, metadata adds meaning to data, enabling users to understand their context, relevance and provenance. As an example, Figure 3.1 shows the metadata of a book[28]. Figure 3.1: Snapshot from the Swisscovery Platform Showing the Bibliographic Record of @zeng_metadata_2022 Metadata are central to the management and preservation of CH data, providing essential information to ensure that data can be properly organised, discovered and retrieved. For example, they facilitate the understanding and interpretation of data, enabling scholars and the public to access and use them effectively [@constantopoulos_aspects_2008]. Metadata also help to ensure the long-term preservation and accessibility of CH data [@zeng_metadata_2022 pp. 490-491]. Providing metadata in a structured manner facilitates forms of aggregation, i.e. individuals and institutions being able to harvest and organise metadata from multiple sources or repositories into a centralised location [see @freire_survey_2017; @freire_metadata_2021]. In addition, the importance of metadata as a gateway to information is particularly compelling when the primary embodiment of a record is either unavailable or lost. In cases where resources, time constraints, sensitive content or strategic decisions prevent the digitisation of an item, metadata becomes the principal means of representation and access. If a physical record is lost or damaged, the metadata associated with that record acts as a proxy for the record. @riley_understanding_2017 [p. 5] discusses the transformation of libraries over time. Initially, libraries moved from search terminals to the modern web-based resource discovery systems we use today. This shift was driven by advances in computerisation. Libraries’ basic approach to metadata is ‘bibliographic’, deeply rooted in their traditional expertise in describing books. This approach involves providing detailed descriptions of individual items so that users can easily locate them within the library’s collection. On the other hand, archives use ‘finding aids’, which are descriptive inventories of their collections, coupled with historical context. These aids are essential for users to understand the material and to find groups of related items within the archive. The metadata used in archives allows for the contextualisation of materials, particularly papers of individuals or records of organisations, providing a richer understanding of the content. Similarly, museums actively manage and track their acquisitions, exhibitions and loans through metadata. Museum curators use metadata to interpret collections for visitors, explaining the historical and social significance of artefacts and describing the relationships and connections between different objects. This helps to enhance the overall visitor experience and understanding of the artefacts on display or the digital resources on a particular website. 3.2.2 Types and Functions CHIs share common objectives and concerns related to information management, as highlighted by @lim_metadata_2011 [pp. 484-485]. These goals typically include facilitating access to knowledge and ensuring the integrity of CH data. However, it is important to note that CHIs also differ widely in how they deal with metadata. Different domains have unique approaches and standards for describing the materials they collect, preserve and disseminate, and even within a single domain there are significant differences. There have been different attempts to categorise the metadata landscape. For instance, @baca_setting_2016 identified the following five categories of metadata and their respective functions: Administrative: Metadata used in managing and administering collections and information resources, such as acquisition and appraisal information or documentation related to repatriation. Descriptive: Metadata used to identify, authenticate, and describe collections and related trusted information resources. Finding aids, cataloguing records, annotations by practitioners and end users, as well as metadata generated by or through a given DAM system can often be classified as descriptive metadata. Preservation: Metadata related to the preservation management of collections and information resources. Common examples of preservation metadata are documentation of physical condition of resources or of any actions taken to preserve resources, whether physical restoration or data migration. Technical: Metadata related to how a system functions or metadata behaves. Examples include software documentation and digitisation information. Use: Metadata related to the level and type of use of collections and information resources, such as circulation records, search logs, or rights metadata. Meanwhile @riley_seeing_2009, as illustrated in a comprehensive visualisation graph[29], suggested seven functions, i.e. the role a standard play in the creation and storage and metadata, and seven purposes referring to the general type of metadata. Functions: Conceptual Model, Content Standard, Controlled Vocabulary, Framework/Technology, Markup Language, Record Format, and Structure Standard. Purposes: Data, Descriptive Metadata, Metadata Wrappers, Preservation Metadata, Rights Metadata, Structural Metadata, and Technical Metadata. Almost a decade later, @riley_understanding_2017 [pp. 6-7] summarised metadata types into four groupings instead of the seven purposes previously mentioned. is removed from the list and technical, preservation and rights metadata are now grouped into a newly created administrative metadata category. Descriptive metadata: For finding or understanding a resource Administrative metadata: Umbrella term referring to the information needed to manage a resource or that relates to its creation 2.1 Technical metadata: For decoding and rendering files 2.2 Preservation metadata: Long-term management of files 2.3 Rights metadata: Intellectual property rights attached to content Structural metadata: Relationships of parts of resources to one another Markup Language: Integrates metadata and flags for other structural or semantic features within content[30]. This classification of metadata types and function differs to the categories identified by @baca_setting_2016 mostly due to the addition of structural metadata and markup language as their own categories [@zeng_metadata_2022 p. 19]. Table 3.1 lists the major types of metadata according to @riley_understanding_2017 [p 7] and include example properties and their primary uses. Table 3.1: Types of Metadata According to @riley_understanding_2017 [p. 7] Metadata (Sub)type Example properties Primary uses 1. Descriptive metadata Title, Author, Subject, Genre, Publication date Discovery, Display, Interoperability 2.1 Technical metadata File type, File size, Creation date, Compression scheme Interoperability, Digital object management, Preservation 2.2 Preservation metadata Checksum, Preservation event Interoperability, Digital object management, Preservation 2.3 Rights metadata Copyright status, Licence terms, Rights holder Interoperability, Digital object management 3. Structural metadata Sequence, Place in hierarchy Navigation 4. Markup languages Paragraph, Heading, List, Name, Date Navigation, Interoperability Ultimately, metadata can also be leveraged to create more inclusive and diverse representations of CH. For instance, metadata can be used to document and promote underrepresented communities and their heritage, providing greater visibility and recognition. This approach aligns with the principles of decolonising CH, promoting equity and social justice by recognising and valuing diverse cultural perspectives, especially in the prevailing anglophone and Western-centric standpoint in DH [@mullaney_internet_2021; @mahony_cultural_2018]. Moreover, the distinction between data and metadata, as discussed in the work of @alter_view_2023, is not always distinct, leading to the concept of ‘semantic transposition’. This complexity reflects in CH where what is considered metadata in one context might be primary data in another, underscoring the necessity for adaptable frameworks in data management. This understanding is crucial for fostering inclusive and diverse representations in CH, ensuring that all cultural narratives are appropriately documented and acknowledged. 3.2.3 Standards Metadata standards play a crucial role in ensuring that data are organised and consistent, facilitating mutual understanding between different stakeholders [@raemy_enabling_2020]. CHIs such as GLAMs typically follow established conventions or standards when organising their resources. Current methods of cataloguing have historical roots dating back to the century, particularly with the development of cataloguing systems such as Antonio Panizzi’s at the British Museum and Charles Coffin Jewett’s efforts to mechanically duplicate entries at the library of the Smithsonian Institution [@zeng_metadata_2022 pp. 14-15]. Unique metadata standards, rules and models have been established and maintained within specific sub-fields. In addition, certain standards for information resources have been endorsed by authoritative bodies [@greenberg_understanding_2005], and some are used exclusively within specific domain communities [@hillmann_metadata_2008]. @riley_understanding_2017 [p. 5] underscores the predilection of CH metadata – whether these standards emanate from libraries, archives, or museums – toward accentuating descriptive attributes. The foundational CH metadata standards, primarily conceived to [@zeng_metadata_2022 p. 11], manifest this thematic focus. Within the CH domain, metadata standards vary widely in scope, and a number of different standards have been developed to meet different needs and priorities[31] [@freire_availability_2018]. The following quoted passage sheds some light on the different approaches and levels of collaboration in metadata standardisation, namely among the library and museum sectors. Despite the striving for homogeneity, in practice, the production of metadata among information specialists and the use of metadata standards is already marked by considerable diversity. This has come about for very pragmatic reasons. Different types of objects and collections require different types of metadata. The curatorial interest for particular information differs for example between images held in an art gallery and a library, as does the information specialists’ domain expertise. Accordingly, diversity in metadata practice seems to be greatest in museums as they are the institutions that govern the most diverse collections. While the library sector has ‘systematically and cooperatively created and shared’ metadata standards since the 1960s, the museum sector, mostly handling images and objects, has been slower to establish such collaboration and consensus. [@dahlgren_diversity_2020 p. 244] In this context, I want to focus on some metadata standards that have proved vital across libraries, archives, museums and galleries. These standards, which I will briefly describe, serve as the foundation for organising, describing, and enabling efficient access to vast and diverse collections. Of particular interest I will be taking a closer look at CIDOC-CRM as it serves as the cornerstone of Linked Art, a fundamental LOUD standard. 3.2.3.1 Library Metadata Standards In libraries, several metadata standards have played crucial roles in organising and accessing collections over the years. The most prevalent historical standard, MARC[32], was a pilot project from the 1960s funded by the CLIR and led by the LoC to structure cataloguing data and distribute them through magnetic tapes [@avram_marc_1968 p. 3]. The standard evolved into MARC21 in 1999 [@zeng_metadata_2022 p. 418] – as exemplified by Code Snippet 3.1, providing a structured format for bibliographic records and related information in machine-readable form. It uses codes, fields, and sub-fields to structure data. Another significant historical standard is the AACR, published in 1967 and revised in 1978 that provides sets of rules for descriptive cataloguing of various types of information resources. Code Snippet 3.1: MARC21 Record of @zeng_metadata_2022 in the Swisscovery Platform leader 01424nam a2200397 c 4500 001 991170746542405501 005 20220427104002.0 008 210818s2022 xxu b 001 0 eng 010 ##$a 2021031231 020 ##$a9780838948750 $qBroschur 020 ##$a0838948758 035 ##$a(OCoLC)1264724191 040 ##$aDLC $bger $erda $cDLC $dCH-ZuSLS UZB ZB 042 ##$apcc 050 00$aZ666.7 $b.Z46 2022 082 00$a025.3 $223 082 74$a020 $223sdnb 100 1#$aZeng, Marcia Lei $d1956- $4aut $0(DE-588)136417035 245 10$aMetadata $cMarcia Lei Zeng and Jian Qin 250 ##$aThird edition 264 #1$aChicago $bALA Neal-Schuman $c2022 300 ##$axxvi, 613 Seiten $bIllustrationen 336 ##$btxt $2rdacontent 337 ##$bn $2rdamedia 338 ##$bnc $2rdacarrier 504 ##$aIncludes bibliographical references and index 650 #0$aMetadata 650 #7$aMetadata $2fast $0(OCoLC)fst01017519 650 #7$aMetadaten $2gnd $0(DE-588)4410512-5 776 08$iErscheint auch als $nOnline-Ausgabe $tMetadata $z9780838937969 776 08$iErscheint auch als $nOnline-Ausgabe $tMetadata $z9780838937952 700 1#$aQin, Jian $d1956- $4aut $0(DE-588)1056085541 856 42$3Inhaltsverzeichnis $qPDF $uhttps://urn.ub.unibe.ch/urn:ch:slsp:0838948758:ihv:pdf 900 ##$aOK_GND $xUZB/Z01/202203/klei 900 ##$aStoppsignal FRED $xUZB/Z01/202203 949 ##$ahttps://urn.ub.unibe.ch/urn:ch:slsp:0838948758:ihv:pdf AACR is no longer maintained and was replaced by RDA[33] around 2010 to be a more adaptive standard to contemporary needs. RDA, while not a markup language like MARC, serves as a content standard that guides the description and discovery of resources, focusing on user needs and facilitating improved navigation of library collections. Its goal is to provide a flexible and extensible framework for the description of all types of resources, ensuring discoverability, accessibility, and relevance for users[34] [@sprochi_where_2016 p. 130]. Libraries often leverage other standards to enrich their metadata practices. MODS[35], introduced in 2002, offers a more flexible XML-based schema for bibliographic description, allowing for better integration with other standards and systems. It was initially developed to carry [@zeng_metadata_2022 p. 423]. MODS provides a balance between human readability and machine processing, making it suitable for a wide range of resources and use cases [@guenther_mods_2003 p. 139]. METS[36], on the other hand, is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. METS, developed as an initiative of the DLF, provides a flexible and extensible framework for structuring metadata, allowing for the packaging of complex digital objects [@cantara_mets_2005 pp. 238-239]. While MODS is primarily concerned with bibliographic information, METS focuses on structuring metadata for digital objects, making it particularly useful for digital libraries and repositories. A further important standard is FRBR, a conceptual framework for understanding and structuring bibliographic data and access points. Originally developed by IFLA in 1997 as part of its functional requirements family of models, FRBR describes three main groups of entities, relationships, and attributes as illustrated by Figure 3.2. The first group of entities are the foundation of the model which characterises four levels of abstraction: WEMI [@denton_functional_2006 p. 231]. FRBR has had a significant impact on the development of RDA, which is loosely aligned with the principles and structures defined by the conceptual framework, but as it isn’t a data model per se; it does not inform how to record bibliographic information in day-to-day practice and focus heavily on textual resources[37] [@sprochi_where_2016 pp. 130-131]. Furthermore, @cossham_models_2017 [p. 11] asserts that FRBR and RDA, ‘don’t align well with the ways that users use, understand, and experience library catalogues nor with the ways that they understand and experience the wider information environment’. Figure 3.2: The FRBR Conceptual Framework. Adapted from @zou_constructing_2018 [p. 36] A further important standard in the field of library science is the LRM, which was introduced as a comprehensive conceptual framework. It provides a broad understanding of bibliographic data and user-centric design principles, aligning with FRBR. LRM defines key entities, attributes, and relationships important for bibliographic searches, interpretation, and navigation – as shown in Figure 3.3. It operates at the conceptual level and does not dictate data storage methods. Attributes in LRM can be represented as literals or URIs. The model is presented in a structured document format to support LOD applications and reduce ambiguity. During its development, a parallel process created FRBRoo (see 3.2.3.3), a model that extends the original FRBR model by incorporating it into CIDOC-CRM. FRBRoo focuses on CH data and is more detailed than LRM, which is designed specifically for library data and follows a high-level, user-centric approach [@riva_ifla_2017 pp. 9-13]. The LRM model, known as LRMer[38], was released in 2020 by IFLA [@zeng_metadata_2022 p. 163]. Figure 3.3: Overview of Relationships in LRM [@riva_ifla_2017 p. 86] BibFrame[39] is another metadata standard in the library domain. It was initiated around 2011 by the LoC to be a successor of MARC, which had become obsolete [see @tennant_marc_2002] as well as being invisible to web crawlers and search engines preventing adequate discoverability of bibliographic resources [@sprochi_where_2016 p. 132]. BibFrame is a loosely RDF-based model [@sanderson_linked_2015], intending to improve the interoperability and discoverability of library resources. While the BibFrame model may not perfectly correspond with the WEMI entities outlined in FRBR, it is possible to effectively link BibFrame resources to FRBR entities, ensuring their compatibility [@sprochi_where_2016 p. 133]. BibFrame aims to transition from MARC by providing a more web-friendly framework, focusing on the relationships between entities, improving data sharing, and accommodating the digital environment. Conversely, @edmunds_bibframe_2023 argues that BibFrame is unaffordable and leads to elitism within libraries, with the main beneficiaries being well-funded institutions, particularly in North America, while placing a financial burden on others. This approach, endorsed by bodies such as the LoC, is criticised for its high cost, impracticality, inequity and limited benefits for cataloguers, libraries, vendors and the public they serve. In addition, the author highlights BibFrame's lack of user friendliness, regardless of the intended users, and criticises the notion of adopting Linked Data for its own sake without substantial practical benefits. 3.2.3.2 Archival Metadata Standards For archives, metadata standards like EAD[40] and ISAD(G)[41] have been pivotal. EAD, introduced in the mid-1990s – it originated in 1993 and the first version of EAD was released in 1998, provides a hierarchical structure for representing information about archival collections, offering comprehensive descriptions that aid researchers, archivists, and institutions in managing and providing access to archival records. Its goal is to create a standard for encoding finding aids to improve accessibility and understanding of archival collections [@pitti_encoded_1999 pp. 61-62]. On the other hand, ISAD(G), released in its first version in 1994 by ICA, offers a more general international standard for archival description, providing a framework for describing all types of archival materials, including fonds, sub-fonds, series, files, and items [@shepherd_application_2000 p. 57]. ISAD(G) aims to establish consistent and standardised archival description practices on a global scale, facilitating the sharing and exchange of archival information. PREMIS[42], is another metadata standard that was initially released in 2005 – version 3.0 is the latest specification, published in 2016 – and focuses on the preservation of digital objects, consisting of four interrelated entities: Object, Event, Agent, and Rights [@caplan_practical_2005 p. 111]. The main objective of PREMIS is to help institutions ensure the long-term accessibility of data by capturing key details about their creation, format, provenance, and preservation events. It is seen as an elaboration of OAIS, which categorises information required for preservation in several functional entities and types of information package [see @bates_open_2009 pp. 425-426] – as illustrated by Figure 3.4, expressed through the mapping of preservation metadata onto the conceptual model [@zeng_metadata_2022 pp. 493-494]. Figure 3.4: OAIS Functional Model Diagram by @mathieualexhache_oais_2021 The latest development in metadata standards for archives is the creation of RiC, which has been developed since 2012 by ICA [@clavaud_ica_2021 pp. 79-80]. RiC is structured into four complementary parts [@ica_expert_group_on_archival_description_records_2023 p. 1] intended to cover and replace existing archival standards such as ISAD(G): RiC Foundations of Archival Description: A brief description of the foundational principles and purposes of archival description. RiC Conceptual Model: A high-level framework for archival description[43], as shown in Figure 3.5. RiC-O: The ontology[44], which embodies a specific implementation of the conceptual model. It is formally expressed in OWL to make archival description available using LOD techniques – which facilitating extensions [see @mikhaylova_extending_2023] – and adheres to a conceptual vocabulary specific to archival description. It provides the ability to navigate and interpret complex archival holdings and foster meaningful research and discovery. The ontology includes seven main groups of entities: Record, Agent, Rule, Event, Date, Place, and Instantiation. RiC Application Guidelines: A part in development at the time of writing which will provide practitioners and software developers with guidance and examples for implementing the conceptual model and the ontology in records and archival management systems. Figure 3.5: Global Overview of the Core Entities Defined by the RiC Conceptual Model. Slightly Adapted from https://github.com/ICA-EGAD/RiC-O 3.2.3.3 Museum and Gallery Metadata Standards In the museum and gallery domain, various metadata standards and conceptual models have significantly contributed to the management, organisation, and accessibility of CH objects and artworks. Notable among these are CDWA, CCO, LIDO, CIDOC-CRM, as well as Linked Art. CDWA[45], developed in the mid-1990s and maintained by the Getty Vocabulary Program, and CCO[46] created by the VRA[47], introduced in the early 2000s, primarily focus on describing art and cultural artefacts, providing a framework for recording essential details like artist, title, medium, date, and provenance. CDWA is a comprehensive set of guidelines for cataloguing and describing various cultural objects, including artworks, architectural elements, material culture items, collections of works, and associated images. While not a data model itself, it offers a conceptual framework for designing data models and databases, as well as for information retrieval. It then evolved into CDWA Lite, an XML schema for data harvesting purposes [@baca_categories_2017 pp. 1-2]. CCO comprises of both rules and examples of the CDWA categories and the VRA Core 4.0 for describing, documenting, and cataloguing cultural works and their visual surrogates[48] [@coburn_cataloging_2010 pp. 17-18]. Both CCO and CDWA are standards that the CIDOC[49] recommends and supports for museum documentation. LIDO[50] is a CIDOC standard introduced in the early 2000s which offers a lightweight XML-based serialisation used for describing museum-related information – as shown in Code Snippet 3.2. It provides a format for the interchange of data about art and CH objects, complementing CDWA and CCO as it integrates and extends CDWA Lite with elements of CIDOC-CRM [@stein_using_2019 p. 1025]. Ultimately, LIDO's goal is to enhance interoperability, accessibility, and the sharing of collection information, enabling institutions to connect and showcase their collections in diverse contexts [@coburn_lido_2010 p. 3]. LIDO is also a CIDOC Working Group, which are created to tackle particular issues or areas of interest[51]. Code 3.2: Example of a LIDO Object in XML from @lindenthal_lido_2023 <lido:lido> <lido:lidoRecID lido:source="ld.zdb-services.de/resource/organisations/DE-Mb112" lido:type="http://terminology.lido-schema.org/lido00099"> ld.zdb-services.de/resource/organisations/DE-Mb112/lido/obj/00076417 </lido:lidoRecID> <lido:descriptiveMetadata xml:lang="en"> <lido:objectClassificationWrap> <lido:objectWorkTypeWrap> <lido:objectWorkType> <skos:Concept rdf:about="http://vocab.getty.edu/aat/300033799"> <skos:prefLabel xml:lang="en"> oil paintings (visual works) </skos:prefLabel> </skos:Concept> </lido:objectWorkType> </lido:objectWorkTypeWrap> </lido:objectClassificationWrap> <lido:objectIdentificationWrap> <lido:titleWrap> <lido:titleSet> <lido:appellationValue lido:pref="http://terminology.lido-schema.org/lido00169" xml:lang="en"> Mona Lisa </lido:appellationValue> </lido:titleSet> </lido:titleWrap> </lido:objectIdentificationWrap> </lido:descriptiveMetadata> </lido:lido> CIDOC-CRM[52], developed since 1996 by the CIDOC and more specifically maintained by the CRM-SIG — which convenes quarterly[53], is a formal and top-level ontology that offers a comprehensive conceptual framework for describing CH resources, allowing for a deep understanding of relationships between different entities, events, and concepts for museums [@doerr_cidoc_2003 pp. 75-76]. It aims to provide a common semantic framework for information integration, supporting robust knowledge representation and fostering collaboration and interoperability within the CH sector as it can also mediate different resources from libraries and archives. The latest stable version of the conceptual model is version 7.1.2[54], published in June 2022, and comprises of 81 classes and 160 properties[55] [see @bekiari_cidoc_2021]. Within the base ontology of CIDOC-CRM – or CRMBase – and despite the emergence of new developments and gradual changes, there is a fundamental and stable core that can be succinctly outlined. This fundamental structure acts as a basic orientation for understanding the way in which data is structured within CIDOC-CRM. Examining the hierarchical structure of CIDOC-CRM, one can identify the main top-level branches, namely: E18 Physical Thing: This class comprises all persistent physical items with a relatively stable form, human-made or natural. E28 Conceptual Object: This class comprises non-material products of our minds and other human produced data that have become objects of a discourse about their identity, circumstances of creation or historical implication. The production of such information may have been supported by the use of technical devices such as cameras or computers. E39 Actor: This class comprises people, either individually or in groups, who have the potential to perform intentional actions of kinds for which someone may be held responsible. E53 Place: This class comprises extents in the natural space we live in, in particular on the surface of the Earth, in the pure sense of physics: independent from temporal phenomena and matter. They may serve describing the physical location of things or phenomena or other areas of interest. E2 Temporal Entity: This class comprises all phenomena, such as the instances of E4 Periods and E5 Events, which happen over a limited extent in time. Complemented by entities tailored for the documentation of E41 Appellation and E55 Type, the structure – as shown in Figure 3.6 – provides a potent set of means to capture a broad range of general-level CH reasoning in a holistic manner [@bruseker_cultural_2017 pp. 111-112]. Figure 3.6: CIDOC-CRM Top-Level Categories by @bruseker_cultural_2017 [p. 112] CRMBase, is supplemented by a series of extensions – sometimes referred to as the CIDOC-CRM family of models – intended to support various types of specialised research questions and documentation, such as bibliographic records or geographical data. These compatible models[56], ordered alphabetically, include both works in progress and models to be reviewed by CRM-SIG[57]. They comprise as follows: CRMact[58]: An extension that defines classes and properties for integrating documentation records about plans for future activities and future events. CRMarchaeo[59]: An extension of CIDOC-CRM created to support the archaeological excavation process and all the various entities and activities related to it. CRMba[60]: An ontology for documenting archaeological buildings. Its primary purpose is to facilitate the recording of evidence and material changes in archaeological structures. CRMdig[61]: An ontology to encode metadata about the steps and methods of production (‘provenance’) of digitisation products and synthetic digital representations such as 2D, 3D or even animated models created by various technologies. CRMgeo[62]: An ontology intended to be used as a global schema for integrating spatio-temporal properties of temporal entities and persistent items. Its primary purpose is to provide a schema consistent with the CIDOC-CRM to integrate geoinformation using the conceptualisations, formal definitions, encoding standards and topological relations. CRMinf[63]: An extension of CIDOC-CRM that facilitates argumentation and inference in descriptive and historical fields. It serves as a universal schema for merging metadata related to argumentation and inference, primarily focusing on these disciplines. CRMsci[64]: The Scientific Observation Model is an ontology that extends CIDOC-CRM for scientific observation, distinguishing the process from results and providing a formal ontology for scientific data integration and research modelling. CRMsoc[65]: An ontology for integrating data about social phenomena and constructs that are of interest in the humanities and social sciences based on analysis of documentary evidence. CRMtex[66]: An extension of CIDOC-CRM created to support the study of ancient documents by identifying relevant textual entities and by modelling the scientific process related with the investigation of ancient texts and their features. FRBRoo[67]: An ontology intended to capture and represent the underling semantics of bibliographic information which interprets the conceptualisations of the FRBR framework. PRESSoo[68]: An ontology intended to capture and reresent the underling semantics of bibliographic information about continuing resources, and more specifically about periodicals (journals, newspapers, magazines, etc.). PRESSoo is also an extension of FRBRoo. Figure 3.7 shows CRMbase and eight of the extensions previously outlined in a pyramid shape, where the lower you go in the pyramid, the more specialised the concepts. Figure 3.7: CIDOC-CRM Family of Models. Diagram done and provided by Maria Theodoridou (Institute of Computer Science, FORTH) Linked Art[69], a recent addition to this landscape, is a community-driven initiative and a metadata application profile that has been in existence since the end of 2016 [@raemy_ameliorer_2022 pp. 136-137]. This community – recognised as a CIDOC Working Group – has created a common Linked Data model based on CIDOC-CRM for describing artworks, their relationships, and the activities around them (see 3.5.5). 3.2.3.4 Cross-domain Metadata Standards There are a few cross-domain standards that have been used to describe CH resources. For instance, the Dublin Core Elements, containing the original core sets of fifteen basic elements, and Dublin Core Metadata Terms[70], its extension, are widely used metadata standards for describing CH resources. It provides metadata properties and classes that are applicable to a wide range of resources [@weibel_dublin_2000]. Another good example is the EDM that has been specified so that national, regional and thematic aggregators in Europe can deliver resources of content providers to Europeana [see @charles_enhancing_2015; @freire_technical_2019]. Despite the presence of cross-domain standards and efforts to map between standards, whether from one version to another or across different domains, reconciling metadata from various sources remains a significant challenge in the CH sector. Institutions may collect metadata in different ways, using different standards and schemas, making it difficult to merge and compare metadata from different sources. Additionally, metadata may be incomplete, inconsistent, or contain errors, further complicating data reconciliation. To address these challenges, standardised, interoperable metadata are necessary to enable data sharing and reuse. While the use of different metadata standards can present challenges for data reconciliation, the adoption of standardised, interoperable metadata can facilitate data sharing and reuse, promoting the long-term preservation and accessibility of CH resources. Controlled vocabularies – included in what @zeng_metadata_2022 [pp. 24-25] called ‘standards for data value’ – such as those maintained by the Getty Research Institute[71]: the AAT, the TGN, and the ULAN, as well as various kinds of KOS (see 3.2.4). These vocabularies provide a common language for describing CH objects and can improve the interoperability of metadata across different institutions and communities. Alongside metadata reconciliation comes also the question of aggregation. Apart from LIDO in museums, the general and current operating model for aggregating CH (meta)data is still the OAI-PMH [see @raemy_enabling_2020], which is an XML-based standard that was initially specified in 1999 and updated in 2002 [@lagoze_open_2002]. Alas, OAI-PMH does not align to contemporary needs [@van_de_sompel_reminiscing_2015], and there are now some alternative and web-based technologies for harvesting resources that are slowly being leveraged such as AS [@snell_activity_2017], a W3C syntax and vocabulary for representing activities and events in social media and other web application. It can also be easily extended and used in different contexts, such as it is the case with the IIIF Change Discovery API (see 3.5.3.3) or with ActivityPub [@lemmer-webber_activitypub_2018], a decentralised W3C protocol being leveraged by Mastodon[72], a federated and open-source social network. Overall, the evolution of metadata standards in the CH domain paves the way for a more interconnected and accessible digital environment, thereby providing better access to disparate collections and facilitating cross-domain reconciliation. This transformation is complemented by a growing emphasis on web-based metadata aggregation technologies that are more suited to today’s needs. 3.2.4 Knowledge Organisation Systems KOS, also known as concept systems or concept schemes, encompass a wide range of instruments in the area of knowledge organisation. They are distinguished by their specific structures and functions [@mazzocchi_knowledge_2018 p. 54]. KOS include authority files, classification schemes, thesauri, topic maps, ontologies, and other related structures. Despite their differences in nature, scope and application, all share a common goal: to facilitate the structured organisation of knowledge and classification of information. According to @zeng_metadata_2022 [p. 284], ‘KOS have a more important function: to model the underlying semantic structure of a domain and to provide semantics, navigation, and translation through labels, definitions, typing relationships, and properties for concepts’. This overarching intent underpins the practice of information management and retrieval. The term KOS ‘became even more popular after the encoding standard Simple Knowledge Organization System (SKOS) was recommended by W3C’, although the use of such systems can be traced back over 100 years, whereas others have been created in the advent of the web [@zeng_metadata_2022 p. 188]. According to @hill_integration_2002 [pp. 46-47, citing [@hodge_systems_2000]], KOS can be divided into four main groups: term lists, metadata-like models, classification and categorisation, as well as relationship models. Term lists encompass authority files, dictionaries, and glossaries, serving as controlled sources for managing terms, definitions, and variant names within a knowledge organisation framework. Metadata-like models encompass directories and gazetteers, offering lists of names and associated contact information as well as geospatial dictionaries for named places, with can be extended for representing events and time periods. In the classification and categorisation domain, you find categorisation schemes and classification schemes that organise content, subject headings that represent controlled terms for collection items, and taxonomies that group items based on specific characteristics. Finally, relationship models feature ontologies, semantic networks, and thesauri, each capturing complex relationships between concepts and terms [@hill_integration_2002; @zeng_knowledge_2008]. Figure 3.8 represents an overview of the structure and functions of these four main groups, showcasing as well the subcategories of KOS previously mentioned. In this figure, the x characters indicate the extent to which each type of KOS embodies five key functions identified by @zeng_knowledge_2008, such as eliminating ambiguity or controlling synonyms. In this subsection, I will explore four subcategories of KOS, each representing a continuum from a more linear to a more structured network. These include folksonomy, taxonomy, thesaurus, and ontology. These KOS have been selected due to their significant impact on the organisation and interlinking of data within the contexts of CHI practices and LOD. Furthermore, the intent of these systems is to help bridge the gap between human understanding and machine processing. Figure 3.8: Overview of the Structures and Functions of KOS [@zeng_knowledge_2008 p. 161] 3.2.4.1 Folksonomy Positioned at one end of the organisational spectrum, folksonomies, also known as community tagging or social bookmarking, are characterised by their user-generated nature. These systems rely on individual users’ tagging of content with keywords or tags that reflect their personal perspectives and preferences. Folksonomies as integration or reconciliation is often hard to achieve [@zeng_metadata_2022 p. 401]. However, they do provide a wealth of source material for studying social semantics [@zeng_metadata_2022 p. 403] and can be done in parallel to more structured KOS. 3.2.4.2 Taxonomy Moving towards the centre of the spectrum, taxonomies present a more structured approach to knowledge organisation. [@zeng_knowledge_2008 p. 169]. Taxonomies employ hierarchical classifications to systematically categorise information into distinct classes and sub-classes, or in a parent/child relationship [@saa_dictionary_taxonomy_2023] - as shown by Code Snippet 3.3 [@niso_guidelines_2010 p. 18]. Taxonomy, in this context, extends beyond mere categorisation; it also establishes relationships. Code Snippet 3.3: Taxonomy Hierarchy Chemistry Physical Chemistry Electrochemistry Magnetohydrodynamics 3.2.4.3 Thesaurus Moving further along the spectrum, thesauri offer a more detailed and formalised method of organisation. They include not only hierarchical relationships but also explicit semantic connections between terms, making them valuable tools for information retrieval. As defined by @niso_guidelines_2010 [p. 9]: A thesaurus is a controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators. For instance, consider a thesaurus related to photography, which encompasses categories for various aspects of photography, including photographic techniques, equipment, and materials. Within this taxonomy, ‘Kodachrome’ could be categorised not only as a specific type of colour film but also as a distinct photographic process. As a type, it could fall under the sub-category of ‘colour film photography’, and as a process, it would fit within the broader framework of ‘photographic techniques’. The AAT, commonly employed in the CH domain, stands as a significant example of a thesaurus [@harpring_development_2010 p. 67]. Homosaurus[73] is another example of a thesaurus with a distinct focus on enhancing the accessibility and discoverability of LGBTQ+ resources and related information. Leveraging Homosaurus in metadata can effectively contribute to diminishing biases present in such data, an essential step in promoting inclusivity and equity within information systems [see @hardesty_mitigating_2021]. 3.2.4.4 Ontology At the structured end of the spectrum, ontologies define complex relationships and attributes between concepts, whereby a series of concepts have been chosen to express what we understand, so that a computer can start making sense of our world. Ontologies are formalised KOS, enabling advanced data integration and KR for more sophisticated applications. The term is drawn from philosophy, where an ontology is a discipline concerned with studying the nature of existence, as articulated by @gruber_translation_1993 [pp. 199-200]: An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an ontology is a systematic account of Existence. For knowledge-based systems, what “exists” is exactly that which can be represented. There are different kinds of ontologies, including axiomatic formal ontologies, foundational ontologies, and domain-specific ontologies [@beretta_interoperabilite_2022]. These different types of ontologies cater to various knowledge representation needs. Foundational ontologies, such as DOLCE [74], provide a high-level framework for modelling knowledge and offers a comprehensive system for representing entities, qualities, and relationships [see @masolo_wonder_2003; @borgo_dolce_2022]. DLs, a family of formal KR languages, play also a key role in developing ontologies and serve as the foundation for OWL (see 3.4.2), notably by providing a logical formalism. DLs are characterised by their ability to provide substantial expressive power that goes well beyond propositional logic, while maintaining decidable reasoning [@chang_abox_2014]. In computer science, the concepts of ABox and TBox, both statements in KBs, are relevant to the structuring and enrichment of KGs [@giacomo_tbox_1996][75]. The ABox, representing the ‘assertion’ or ‘instance’ level, encapsulates concrete data instances and their relationships, contributing to the factual knowledge of a given system. Conversely, the TBox, representing the ‘terminology’ or ‘schema’ level, defines the conceptual framework and hierarchies that govern the relationships and attributes of the instances. These two complementary components work in harmony to improve data interoperability, reasoning and knowledge sharing. Figure 3.9 depicts a high-level overview of a KB representation system. Figure 3.9: Knowledge Base Representation System Based on @patron_embedded_2011 [p. 205] Consider a scenario around artwork provenance held in a museum. The ABox strives to encapsulate the rich narratives of individual artworks, tracing their journey through time, ownership transitions and exhibition travels. At the same time, the TBox creates a conceptual scaffolding, imbued with classes such as Artwork, Creator, and Exhibition, painting an abstract portrait that contextualises each artefact within a broader cultural tapestry. It is here that the DL comes in, harmonising the symphony with its logical relationships and axioms, i.e. a rule or principle widely accepted as obviously true [@baader_13_2007]. The DL is represented as 𝒦 = (𝒯, ℛ, 𝒜), where: 𝒯: represents the TBox, defining the conceptual framework, which encompasses the hierarchical relationships, classes, and concepts within the KB. ℛ: represents the set of binary roles, delineating the relationships and connections between individuals or instances in the domain. These roles facilitate the understanding of how entities relate to one another within the KB. 𝒜: represents the ABox, encompassing the specific assertions or instances in the KB. This symbiotic interplay ensures that the provenance of each artwork is not just a static account, but a dynamic, interconnected narrative. The ABox-TBox relationship thrives in the realm of reasoning. Imagine an axiom embedded in the TBox: ‘A work of art presented in an exhibition curated by a distinguished patron is of heightened cultural significance’, or here phrased in DL terms: ∃ curates.Artwork.CulturalSignificance ⊑ true. This axiom serves as a beacon to guide the system’s reasoning. When an ABox instance of an artwork is woven into an exhibition curated by a prominent authority, the DL-informed engine responds by inferring an enriched cultural value that resonates beyond the artefact itself. This is where the TBox takes data and gives it life, producing insights that transcend the boundaries of individual instances. The KB, 𝒦, captures this orchestration, encapsulating the logical relationships for meaningful interpretation and knowledge discovery. Overall, the relationship between ABox and TBox in DL is vital for achieving semantic clarity, enabling meaningful data integration, and facilitating advanced reasoning mechanisms. The museum provenance scenario showcases a precisely orchestrated convergence of assertion, terminology, and rigorous logical reasoning. This engenders a computational landscape where historical artefacts intricately mesh within the complex network of human history’s data structures, seamlessly aligning with the underlying framework of algorithmic representation. These components enable software developers to harmonise disparate datasets, extract insightful knowledge, and support decision-making processes across a wide range of domains. In essence, the use of DL, ABox, and TBox in ontological KR enhances interoperability between different systems and allows for sophisticated reasoning and decision support. Moving beyond these foundational concepts, it is noteworthy to consider the work of @ehrlinger_towards_2016, who address the need for a clear and standardised definition of KGs. They highlight the term’s varied interpretations since its popularisation by Google in 2012 and propose a definitive, unambiguous definition to foster a common understanding and wider adoption in both academic and commercial realms. They define a KG as follows: ‘A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge’. This definition crystallises the essence of KGs as dynamic and integrative systems that not only store but also process and enrich data through advanced reasoning. This conceptualisation underlines the transformative potential of KGs in various domains, bridging the gap between raw data and actionable insights. Finally, it is important to recognise that the importance of ontologies extends beyond individual systems. Shared ontologies are a cornerstone of semantic interoperability, thus facilitating a paradigm shift in the way systems and applications communicate. As @sanderson_rdf_2013 argues: ’shared ontologies increases semantic interoperability’ and ‘shared identity makes it possible for graph to merge serendipitously’. This shared understanding ensures that various entities can seamlessly connect and engage in meaningful interactions. 3.3 Trends, Movements, and Principles Technological trends, scientific movements, and guiding principles have played a crucial role in shaping the landscape of contemporary research. In recent years, there has been an increased emphasis on the need for academic and CH practices to be more transparent, inclusive, and accountable. This shift reflects a broader trend towards integrating advanced technological solutions and open-science principles in heritage management. As such, understanding the evolution of CH becomes imperative to comprehend how these practices have adapted and transformed in response to these guiding trends. The evolution of CH has been characterised by a series of technological and methodological shifts. Initially, the primary focus was on digitising physical artefacts to preserve information from degrading originals. This phase was crucial for transitioning tangible CH into a digital format, mitigating the risk of loss due to physical degradation. Following this, efforts shifted towards ensuring the persistence of digitised resources. This stage involved addressing challenges related to digital preservation, including data degradation and format obsolescence, to ensure the longevity of digital cultural assets. The advent of open data principles marked the next phase in CH development. This approach facilitated broader access to information, aligning with contemporary values of transparency and inclusivity in, governmental, academic, and cultural contexts. Subsequently, the focus expanded to enhancing the utility of this data. This stage involved contextualising and enriching CH data, thereby increasing their applicability and relevance across various domains. The current frontier in CH involves developing applications that leverage rich CH data. These applications serve not only as tools for engagement and education but also as justifications for the ongoing costs associated with data storage and archival. They illustrate the tangible benefits derived from preserving heritage resources, encompassing both cultural and economic returns. In summary, the trajectory of CH development mirrors broader technological and societal trends, transitioning from preservation to active utilisation. This progression underscores the dynamic nature of research and CH processes, highlighting the evolving requirements for transparency, inclusivity, and accountability in CH management. While automation has significantly enhanced the efficiency of digitisation processes in CH, cataloguing and indexing remain complex challenges. The intricacies involved in accurately understanding and categorising resources necessitate more than just technological solutions; they require context-aware and culturally sensitive approaches. Here, ML offers promising perspectives. ML, particularly in its advanced forms like deep learning, can assist in cataloguing and indexing by analysing large datasets to identify patterns, categorise content, and even suggest metadata. This can be particularly useful in handling large volumes of CH data, where manual processing is time-consuming and prone to human error. Typical applications of ML in this field include image recognition for identifying and classifying visual elements in artefacts, NLP for analysing textual content, and pattern recognition for sorting and organising data based on specific characteristics. Furthermore, prospective developments may entail the refinement of metadata mapping and the enhancement of quality control mechanisms. Moreover, ML algorithms can be trained to recognise stylistic elements, historical contexts, and other nuances that are essential for accurate cataloguing in CH. However, it is crucial to note that the effectiveness of ML depends heavily on the quality and diversity of the training data. Biases in this data can lead to inaccuracies in cataloguing and indexing. Thus, a collaborative approach, where ML is supplemented by expert human oversight, is often the most effective strategy. Overall, this section provides a comprehensive overview of six three[26:1] technological trends as well as five key scientific movements and guiding principles that are shaping research and how universities and GLAMs should provide environments, services, and tools with a view to collecting and disseminating content. By exploring each of these trends, movements, and principles, we can gain a deeper understanding of how research and CH processes are permeated by dynamic movements and how resources can be made more transparent, inclusive and accountable, as well as how data can be made available to human and non-human users. 3.3.1 Current and Emerging Technological Trends in Cultural Heritage I will explore some current and emerging technological trends in CH, organised into three components: Linked Data, big data, and AI. Each represents a critical driver shaping the landscape and practices of heritage data. The three trends have been around for a few decades, with the ‘Linked Data’ principles and underlying standards coming from the late 1990s, ‘big data’ being coined in 1990 and AI in 1956. Before considering the trends discussed hereafter, note that current technological developments do not exist in isolation, but tend to intertwine and act synergistically. A vivid example of this interplay can be seen in AI and its latent impact on the semantic web, particularly in facilitating more efficient querying and crawling processes such as the LinkedDataGPT proof-of-concept service[76] from Liip on the City of Zurich that combines ChatGPT — a generative AI solution — on top of a Linked Data portal to facilitate querying open datasets [@stocker_use_2023]. Inversely AI can be fed by data on the web to learn and reason, as outlined by @gandonWebScienceArtificial2019. 3.3.1.1 Linked Data Linked Data, and most precisely LOD, is a set of design principles adhering to RDF which is a significant approach to interconnect data on the web in order to make semantic queries more useful [@berners-lee_semantic_2001]. In other words, this standardisation allows data to be not only linked, but also openly accessible and reusable. As noted by @gandonWebScienceArtificial2019 [p. 115, citing [@gandon_pour_2017]]: The Web was initially perceived and used as a globally distributed hypertext space for humans. But from its inception, the Web has always been more: its hypermedia architecture is in fact linking programs world-wide through remote procedure calls. This deeper understanding of the web’s architecture as a conduit for linking programs on a global scale holds profound implications. It signifies that the web is not merely a medium for accessing information but a dynamic environment where data-driven programs interact, exchange data, and collaborate across geographical boundaries. In this context, Linked Data emerges as a powerful enabler, providing a structured and standardised approach for these programs to communicate and share meaningful data [@bizer_linked_2008]. In the context of CH, institutions such as museums, libraries and archives can publish their collections using Linked Data principles, enabling a web of linked information that is accessible to all. As this dissertation’s main topic revolves around Linked (Open) (Usable) Data, two dedicated sections have been written within this literature review in Section 3.4 and Section 3.5. Beyond formal LOD, CHIs may also link their databases or collections in more informal ways. This interconnection may take the form of shared metadata, common identifiers, or simply hyperlinks. These links can enhance the user experience by supporting a more seamless navigation between related items or pieces of information. For instance, a parallel strategy is the use of graph-based data representation, i.e. property graph which consists of a set of objects or vertices, and a set of arrows or edges connecting the objects, that are most likely not RDF-compliant [see @bermes_modelisons_2023]. Graph databases, such as Neo4j[77] which is quite prevalent in DH [see @webber_programmatic_2012; @drakopoulos_semantically_2019; @darmont_data_2020], allow for efficient storage and retrieval of interconnected data through nodes representing entities and relationships linking them. 3.3.1.2 Big Data Big Data refers to extremely large and complex datasets that exceed the capabilities of traditional data processing methods and tools. It encompasses a massive volume of structured, semi-structured and unstructured data that is currently flooding across a variety of sectors, companies and organisations [see @emmanuel_defining_2016]. The characteristics of big data are often described by the three ⋁ model [@laney_3d_2001]: Volume: Big data refers to a massive amount of data. This can encompass a spectrum of data sizes, extending from GB and TB, to PB[78] and beyond. The sheer size of the data is a key aspect of big data, making traditional database systems inadequate for storage and analysis. Velocity: Data is being generated and collected at an unprecedented rate. Social media posts, sensor data, online transactions and more are constantly being generated, requiring real-time or near real-time processing and analysis. Variety: Big data comes in a variety of formats, including structured data (e.g. databases), semi-structured data (e.g. XML, JSON) and unstructured data (e.g. text, images, video). The variety of data types requires flexible processing methods. In addition to the three ⋁ model, two more characteristics are often included [@saha_data_2014 p. 1294]: Veracity: It refers to the quality of the data, including its accuracy, reliability and trustworthiness. Big data sources can be inherently uncertain or inaccurate, and addressing data quality is a critical challenge. Value: Extracting value and actionable insights from big data is the ultimate goal. Analysing and interpreting Big data should lead to better decision-making, improved business strategies, as well as enhanced UX[79]. Regarding the two latter dimensions, @debattista_linked_2015 argue that that Linked Data is the most suitable technology to increase the value of data over conventional formats, thus contributing towards the value challenge in Big Data. As for veracity, they describe a semantic pipeline with eight key metrics to address the veracity dimension. Building on this technological foundation, the integration of Linked Data and Big Data analytics takes centre stage. Big data analytics can be employed on CH content to uncover insights and correlations that can be used in decision-making. @barrile_big_2022 [p. 2708] highlight the transformative potential of using big data by investigating how analytical approach can enhance conservation strategies, aid resource allocation and optimise the management of CH resources. @poulopoulos_digital_2022 [pp. 188-189] emphasise that emerging technology trends, including big data, have a significant impact on related research areas such as CH. Big data primarily originates from sources such as social media, online gaming, data lakes[80], logs and frameworks that generate or use significant amounts of data. They stress that the incorporation of multi-faceted analytics in the CH domain is an area of active research, and present a data lake that provides essential user and data/knowledge management functionalities. However, they emphasise a crucial consideration - the need to bridge the theoretical foundations of disciplines such as cultural sociology with the technological advances of big data. 3.3.1.3 Artificial Intelligence AI has been coined for the first time by John McCarthy, an American computer scientist and cognitive scientist, during the 1956 Dartmouth Conference, which is often considered the birth of AI as an academic field [@andresen_john_2002 p. 84]. According to the @oxford_english_dictionary_artificial_2023, AI is described as follows: The capacity of computers or other machines to exhibit or simulate intelligent behaviour; the field of study concerned with this. In later use also: software used to perform tasks or produce output previously thought to require human intelligence, esp. by using machine learning to extrapolate from large collections of data. While AI is not the central focus of my PhD thesis, I acknowledge its impact in several instances. As a rapidly developing technology, AI has the potential to significantly transform various aspects of society, including the way we describe, analyse, and disseminate CH resources. It is worth mentioning that I endeavour to engage in a broader discourse concerning the domain of AI. In this context, I use the acronyms AI to talk about the overarching domain or its ethics, and ML to discuss the specifics of methodologies and algorithmic approaches, while refraining from delving into the intricacies of Deep Learning, which is a distinct subdomain within ML. AI and ML offer great potential for digitising, curating and analysing CH, leveraging the vast digital datasets from CHIs. Some of the examples include text recognition mechanisms using OCR and HTR, NLP and NER for enriching unstructured text, as well as object detection methods for finding patterns within still and moving images [@neudecker_cultural_2022; @sporleder_natural_2010]. Textual works can also be analysed, for instance for sentiment analysis [see @susnjak_applying_2023], and generated using LLM – a variety of NLP, such as BERT or ChatGPT, which predicts the likelihood of a word given the previous words present in recorded texts. However, challenges such as data quality and biases in AI persist [@neudecker_cultural_2022]. In addition, there are still uncertainties regarding the licensing and reuse of CH datasets by ML algorithms[81]. @neudecker_cultural_2022 emphasises the importance of well-curated digitised CH resources that are openly licensed, accompanied by relevant metadata, and accessible through APIs or download dumps in various formats. These curated resources have the potential to address the existing gap in this domain. Building on the theme of enhancing CH through digital technologies, @mcgillivray_digital_2020 explore the synergies and challenges found at the intersection of DH and NLP. DH is aptly described as ‘a nexus of fields within which scholars use computing technologies to investigate the kinds of questions that are traditional to the humanities […] or who ask traditional kinds of humanities-oriented questions about computing technologies’ [@fitzpatrick_reporting_2010]. This broad characterisation encapsulates the transformative potential of digital tools, including ML techniques, in enriching humanities research. @mcgillivray_digital_2020 highlight the critical need for bridging the communication gap between DH and NLP to drive progress in both fields. They propose increased interdisciplinary collaboration, encouraging DH researchers to actively utilise NLP tools to refine their research methodologies. A primary challenge in this convergence is the application of NLP to the complex, historical, or noisy texts often encountered in DH research. They conclude by advocating for stronger cooperation between practitioners in these fields. This collaborative effort is vital for harnessing the full potential of ML in analysing and interpreting CH. The use of ML scripts in the context of CH — and beyond — is inherently limited by their applicability, namely when dealing with historical photographs. In such cases, the use of algorithms that are mostly trained and grounded in contemporary image data becomes quite incongruous due to the dissimilarity in temporal contexts. This dilemma is exemplified by datasets such as Microsoft’s Common Object in Context (COCO)[82] [@fleet_microsoft_2014], where the available data are predominantly contemporary photographic content, which is misaligned with the historical nuances inherent in most of the digitised CH images. @coleman_managing_2020 corroborates that a sound approach would be for ML practitioners to collaborate with libraries as they can draw practical lessons from critical data studies and the thoughtful integration of AI into their collections, using guidelines from DH. She also advocates that as handing handing over datasets would be a disservice to library patrons and that ‘Librarians need to master the instruments of AI and employ them both to learn more about their own resources—to see and analyze them in new ways—and to help shape applications of AI with the expertise and ethos of libraries.’ Ethical concerns, particularly regarding social biases and racism, are prevalent in technologies like ImageNet, where facial recognition may yield AI statements with strong negative connotations [@neudecker_cultural_2022]. Addressing this, @gandonWebScienceArtificial2019 suggest the production of AI services that are ‘benevolent-by-design for the good of the Web and society’. Furthermore, @floridi_good_2023 introduces the double-charge thesis, asserting that all technology design is a moral act, challenging the neutrality thesis. He emphasises that technologies are not neutral and can be influenced by a dynamic equilibrium of values, predisposing them towards morally good or evil directions. As mentioned previously, the ML training datasets are often not enough representative to be properly leveraged in the CH sector [@strien_introduction_2022]. Fine-tuning is now a topic though and new ground truth datasets have been created and tailored for the needs of CH, such as Viscounth[83], a large-scale VQA dataset — i.e a dataset containing open-ended questions about images which requires an understanding of vision, language and commonsense knowledge to answer [@goyal_making_2017] — for CH in English and Italian [see @becattini_viscounth_2023]. @jaillant_unlocking_2022 argue that the governance of AI ought to be carried out in partnership with GLAM institutions. However, while this collaboration has been proposed as a promising way forward, it still requires further exploration and evaluation, particularly with regards to the specific challenges and opportunities that it presents. On the one hand, the involvement of GLAMs in AI governance could enhance the development of digital CH projects that promote social justice and equity. However, on the other hand, this collaboration raises several challenges, such as the need to address issues of privacy, data protection, and intellectual property rights, and to ensure that the values and perspectives of GLAM professionals are adequately represented in the development of AI algorithms and systems. Therefore, it is crucial to examine the specific challenges and opportunities of this collaboration and to develop appropriate frameworks and guidelines that enable effective and ethical governance of AI in the GLAM sector. One of these platforms that address these issues is AI4LAM, which is an international and participatory community focused on advancing the use of AI in, for and by libraries, archives, and museums[84]. The initiative was launched by the National Library of Norway and Stanford University Libraries in 2018 inspired by the success of the IIIF community. Another agency is the AEOLIAN Network[85], AI for Cultural Organisations, which investigates the role that AI can play to make born-digital and digitised cultural records more accessible to users [@jaillant_applying_2023 p. 582]. As an illustrative case, the LoC's exploration into ML technologies, as highlighted by @allen_why_2023, demonstrates a strategic commitment to enhancing the accessibility and utility of its diverse collections. This initiative reflects the LoC's acknowledgement of the transformative potential of ML, balanced with a cautious approach due to the necessity for accurate and responsible information stewardship. The LoC faces several challenges in applying ML, particularly the limitations of commercial AI systems in handling its varied materials and the requirement for substantial human intervention. This cautious exploration into ML is indicative of a broader trend in CHIs, where maintaining a balance between embracing technological advancements and preserving authenticity and integrity is crucial. The specific experiments and projects undertaken by the LoC in the realm of ML are diverse and illustrative of the institution’s comprehensive approach to innovation. For instance, image recognition systems have been tested for identifying and classifying visual elements in artefacts, a task that requires a nuanced understanding of historical and cultural contexts. In another initiative, speech-to-text technology was employed to transcribe spoken word collections, confronting challenges such as accent recognition and audio quality variation. Additionally, the LoC explored the potential of ML in enhancing search and discovery capabilities through projects like Newspaper Navigator[86], which aimed to identify and extract images from digitised newspaper pages. These experiments not only highlight the potential of ML in transforming the way LoC manages and disseminates its collections but also reveal the complexities and limitations inherent in these technologies. As @allen_why_2023 notes, the ongoing research and experimentation in ML at the LoC are critical in revolutionising access and discovery in the cultural heritage sector. These efforts, while facing challenges, represent a diligent integration of advanced technologies, upholding principles of responsible custodianship and setting a precedent for similar institutions globally in the adoption and adaption of ML and AI in CHIs. The integration of LLM and KG presents a groundbreaking opportunity, particularly within the realm of CHIs, where there is already considerable expertise. This is aptly demonstrated in the work of @pan_large_2023, which elucidates the harmonisation between explicit knowledge and parametric knowledge, i.e. knowledge derived from patterns in data, as learned by models such as LLMs. The authors highlight three key areas for the advancement of KR and processing: Knowledge Extraction, where LLMs improves the extraction of knowledge from diverse sources for applications such as information retrieval and KG construction; Knowledge Graph Construction, which involves LLMs in tasks such as link prediction and triple extraction from data, albeit with challenges in precision and management of long tail entities; Training LLMs Using KGs, where KGs provides structured knowledge for LLMs, helping to build retrieval-augmented models on the fly, enriching LLMs with world knowledge and increasing its adaptability. In a report for the University of Leeds in the UK, @pirgova-morgan_looking_2023 explores the potential and practical implications of AI in libraries. The project, forming part of the university’s ambitious vision for digital transformation, aims to understand how AI can be effectively integrated into library services. This research looks at both the use of general AI for long term strategic planning and specific AI applications for improving UX, process optimisation and enhancing the discoverability of collections. The methodology used in this study involves a multi-faceted approach including desk-based assessments, a university-wide survey and expert interviews. Specifically, the study highlights the following key findings: AI for UX and Process Optimisation: The integration of AI technologies offers substantial opportunities for improving user experiences in libraries. This includes optimising library processes, enhancing collections descriptions, and improving their discoverability. Challenges and Opportunities of AI Application: While AI presents exciting possibilities, its practical application in library settings faces challenges. These include evaluating specific AI technologies in the unique context of the University of Leeds, ensuring they align with the institution’s needs and goals. Perceptions of AI in Libraries: The report reveals varying perceptions among librarians and users regarding AI. This includes views on how AI can contribute to resilience, awareness of climate change, and practices promoting equality, diversity, and inclusion. Role of AI in Strategic Library Development: General AI technologies are seen as instrumental in shaping long-term strategies for libraries, highlighting the need for ongoing adaptation and development in response to evolving AI capabilities. Expert Perspectives on AI in Libraries: Interviews with experts from around the world underscore the importance of understanding both general and specific applications of AI. These insights help in identifying priority areas where AI can significantly enhance library operations and services. These insights from the University of Leeds report illustrate the complex impact of AI on library services, from enhancing user interaction to influencing strategic decision-making, while also emphasising the importance of adapting AI applications to specific institutional needs. It must be also stated that AI lacks inherent intelligence and consciousness, and have been ultimately built by people. An important concern, namely with LLM, is the perceptual illusion of cognitive interaction, where the machine appears to be engaging in dialogue and reasoning, when in fact it is generating content through predictive algorithms [see @ridge_enriching_2023]. Furthermore, regarding the topic of data colonialism, poor people in underprivileged nations are often burdened with the responsibility of cleaning up the toxic repercussions of AI, shielding affluent individuals and prosperous countries from direct exposure to its harmful effects[87]. Concluding this segment, it is essential to perceive ML algorithms as uncertain ‘socio-material configurations’, which can be seen as both powerful and inscrutable, demanding an axiomatic and problem-oriented approach in their understanding and application. @jaton_we_2017 elaborates on this by examining how these algorithms, while technologically complex, are firmly rooted in and shaped by the social, material, and human contexts in which they are developed. Beyond their computational complexity, these algorithms are deeply embedded in the process of constructing . These ground truths are not inherent or fixed; instead, they emerge from collaborative efforts that reflect the varied inputs of actors. This process underscores the algorithms as socio-material constructs, influenced by the characteristics and contexts of their creators. Understanding algorithms in this light highlights their deep integration with human actions and societal norms, offering a more nuanced view of their design and implementation [see @jaton_assessing_2021; @jaton_groundwork_2023]. 3.3.2 Scientific Movements and Guiding Principles First, 3.3.2.1 examines the movement towards more open and transparent forms of research. Open scholarship is a broad concept that encompasses practices such as open access publishing, open data, open source software, and open educational resources. The subsection explores the benefits and challenges of open scholarship, and how it can help to increase the accessibility and impact of research data. Then, 3.3.2.2 explores the growing trend of involving members of the public in scientific research. Citizen science and citizen humanities involve collaborations between scientists and non-expert individuals, with the aim of generating new knowledge or solving complex problems. The subsubsection examines the benefits and challenges of citizen science and citizen humanities, and how they can help to democratise research. 3.3.2.3 examines the set of guiding principles designed to ensure that research outputs are FAIR. It explores the importance of each data principle for research integrity, reproducibility, and collaboration, and provides examples of how they can be implemented in practice. 3.3.2.4 explores the importance of ethical and culturally sensitive data governance practices for indigenous communities that are materialised through CARE. These principles provide a framework for managing data in a way that is consistent with the values and cultural traditions of indigenous communities. This part explores as well the challenges and opportunities of implementing the CARE Principles for Indigenous Data Governance. Finally, 3.3.2.5 explores the concept of ‘Collections as Data’, a perspective that has emerged from the practical need and desire to improve decades of digital collecting practice. This approach re-conceptualises collections as ordered digital information that is inherently amenable to computational processing. 3.3.2.1 Towards Open Scholarship According to the FOSTER[88], Open Science can be described as ‘[…] the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.’ [@foster_open_2019]. In recent years, the principles of Open Science, that historically include Open methodology, Open source, Open data, OA, Open peer review, as well as open educational resources, have become increasingly important as they emphasise transparency, collaboration and accessibility in scientific research [@bezjak_open_2019]. Open methodology refers to the sharing of research processes and methods, allowing other researchers to reproduce and build on existing work [see @vicente-saez_open_2018]. Open source software and tools enable researchers to collaborate, while open data practices promote the sharing of research data in ways that are accessible, discoverable and reusable by others[89]. Open access seeks to remove financial and other barriers to accessing scientific knowledge, while open peer review provides greater transparency and accountability in the publication process. Finally, open educational resources encourage the sharing of teaching and learning materials, thereby facilitating the dissemination of knowledge and skills. @unesco_preliminary_2019 conducted a preliminary study of the technical, financial and considerations related to the promotion of Open Science. This research underscored the necessity for a holistic approach to Open Science and stressed the significance of tackling international legal matters, as well as the existing challenges stemming from unequal access to justice, which can hinder global scientific collaboration. This study laid the groundwork for a recommendation on making ‘[…] multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation’ [@unesco_implementation_2021 p. 7]. UNESCO identified five types of access related to Open Science: infrastructures, societal actors, as well as associated and diverse knowledge systems where dialogue is needed. This includes acknowledging the rights of indigenous peoples and local communities to govern and make decisions on the custodianship, ownership, and administration of data on traditional knowledge and on their lands and resources. Figure 3.10 provides a visual summary of this. Figure 3.10: Open Science Elements, Redrawn Slide from Presentation of Ana Persic [@morrison_redrawn_2021 citing [@persic_building_2021]] While Open Science offers numerous benefits, it also presents challenges and potential drawbacks that warrant careful consideration. One major concern is the risk of exacerbating inequities between researchers from well-resourced institutions and those from less privileged backgrounds. Open access publishing often entails significant costs in the form of article processing charges, which can disproportionately burden researchers without adequate funding support [@burchardt_researchers_2014]. Additionally, Open Science practices relying on open protocols may be vulnerable to misuse, such as automated bots excessively crawling open repositories or datasets. This can lead to overloading systems, unauthorised data extraction, or unintended uses of research outputs [see @irish_bots_2023; @li_good_2021]. These risks underscore the importance of balancing openness with safeguards that ensure equitable participation and secure, sustainable access to research materials. These challenges are particularly relevant in the context of DH, a field that harnesses the promise and impact of digital technologies and methodologies for the study and understanding of cultural phenomena. The adoption of Open Science principles has contributed to greater collaboration, transparency and accessibility in research practices in this field. Open data practices are particularly relevant, as they allow scholars to work with large and complex datasets, including digitised archives and social media data. Open educational resources can also be used to support the dissemination of CH literacy and skills, enabling wider audiences to engage with such resources. However, ensuring that such openness does not exacerbate inequities or introduce vulnerabilities requires thoughtful implementation. In addition to the principles of Open Science, the concept of Open Scholarship has been introduced by [@tennant_tale_2020] as a broader approach that encompasses the arts and humanities and goes beyond the research community to the wider public. Open Scholarship emphasises the importance of making research and scholarship accessible to a wider audience, including non-experts, educators and policy makers. It can be particularly relevant to the arts and humanities, as they often deal with complex cultural materials and narratives that have wider societal implications. By making their work openly accessible and engaging with non-experts, humanities researchers can contribute to public discourse, promote cultural understanding, and inform policy and decision-making. Open scholarship can also support greater collaboration and innovation within the Arts and Humanities by enabling researchers to work collaboratively across disciplines and with a wide range of constituents. For instance, open educational resources can be used to develop collaborative teaching and learning materials that draw on the expertise of scholars and practitioners from different disciplines, while open data practices can facilitate the sharing and reuse of CH materials. Conversely, @knochelmann_open_2019 advocates for the term Open Humanities as a dedicated discourse that would within the humanities. Notably, he argues that Open Humanities should adapt key Open Science elements to the Humanities’ unique context. In the case of preprints, the challenges in the humanities, such as limited discipline-specific preprint servers and linguistic diversity, require tailored solutions to encourage adoption. Open peer review in the humanities should accommodate the field’s subjectivity and diverse perspectives. Concerns about liberal copyright licenses revolve around potential misrepresentation and plagiarism, highlighting the importance of maintaining scholarly integrity regardless of the chosen license. Knochelmann’s proposal underscores the need for context-sensitive approaches to promote openness and collaboration while respecting humanities’ distinct characteristics. Overall, the principles of Open Science provide a framework for promoting greater collaboration, transparency and accessibility in research practices. Yet, the challenges discussed underscore the need for careful adaptation to address inequities, cybersecurity concerns, and field-specific nuances. The concept of Open Scholarship, which stresses the importance of making research and scholarship accessible to wider audiences, can be instrumental in broadening the impact of research in both natural sciences and the humanities, as Open Science encourages greater collaboration and innovation across disciplines. Ultimately, this underscores the need for adaptation and positions all academic disciplines as essential contributors to societal understanding, cultural preservation and informed decision-making, while ensuring the sustainability and integrity of open practices. 3.3.2.2 Citizen Science, Citizen Humanities Citizen Science and Citizen Humanities are approaches that involve the public in scientific and humanities research, respectively. They have become increasingly popular in recent years as a means of democratising research and engaging the public in academic initiatives. Citizen Science, as articulated by @irwin_citizen_1995, embodies a fundamental commitment to sourcing knowledge beyond the confines of academia, with a deliberate focus on addressing the concerns and interests of the public. This perspective underscores the transformative power of Citizen Science, making it a catalyst for a more democratic approach to scientific endeavours. @bonney_citizen_1996’s perspective complements this vision by framing Citizen Science as a collaborative process where amateur enthusiasts actively participate in data collection for academic science, all the while gaining a deeper understanding of scientific principles and processes. In this light, Citizen Science emerges as an ideal vehicle for science education and a potent tool for enhancing public appreciation of scientific pursuits. These viewpoints loosely align with the Oxford English Dictionary’s definition, characterising Citizen Science as ‘scientific work undertaken by members of the general public, often in collaboration with or under the direction of professional scientists and scientific institutions’ and traces the earliest evidence of the term in 1989 [@oxford_english_dictionary_citizen_2023]. As such, Citizen Science stands as a harmonious intersection of public engagement, education, and scientific inquiry, amplifying the voice of non-academic contributors and democratising the scientific landscape. The public can play a vital role in data collection, analysis, and interpretation. This involvement can take the form of participating in wildlife sightings tracking, monitoring water quality, or assessing air pollution. By participating in these activities, citizens become direct contributors to the generation of valuable scientific data. The transformative power of Citizen Science extends across a wide spectrum of scientific disciplines, emphasising its capacity to democratise and broaden the reach of scientific endeavours [see @vohland_science_2021]. Citizen Science is a form of co-creation, whether viewed as an innovation-oriented means of value creation [@jansma_co-creation_2022] or as a more radical form of empowerment, reinforces the democratisation of the research process [@metz_co-creative_2019]. It amplifies the voice of non-academic participants in scholarly pursuits, reflecting a profound shift in the way science is conducted. This collaborative model demonstrates how public engagement enriches the scientific landscape, allowing for the inclusion of different perspectives and a wider range of voices in the pursuit of knowledge. Furthermore, engaging in participatory practices also involves elements of ‘phronesis’ [90] [see @mehlenbacher_expertise_2022], encompassing moral, affective, and care-oriented dimensions. Trust is also a foundational and indispensable element in the landscape of participatory initiatives [see @dahlgren_diversity_2020]. The success and sustainability of projects within Citizen Science heavily rely on establishing and maintaining trust among all stakeholders involved. This trust extends in multiple directions. First and foremost, participants must trust the project organisers and platforms that host these initiatives. They must have confidence that their contributions will be used responsibly and ethically, with respect to their time and effort. When contributors are assured that their involvement is valued and that the data they provide serves a meaningful purpose, their motivation to participate and provide accurate information is bolstered. Conversely, project organisers and institutions also need to instil trust in participants. Transparency in project objectives, methodology and data use is paramount. Clear and consistent communication is essential to address participants’ concerns and provide feedback on the impact of their contributions. This two-way trust is the foundation of successful participatory projects and facilitates long-term engagement. Citizen Humanities where members of the public can participate in activities such as crowdsourced transcription, tagging, and annotation of digital CH materials. These activities can help to uncover new knowledge and insights, as well as to make CH materials more accessible to a wider audience [@strasser_citizen_2018]. It is important to note that within the context of these terms, Citizen Science is often regarded as the broader concept, encompassing both Citizen Science and Citizen Humanities. While the primary distinction between the two may, in some cases, appear to be terminological, in practice, they both exemplify the principles of open and inclusive research, akin to the concepts of Open Science and Open Humanities discussed in the preceding subsection. These approaches foster collaboration and engagement between researchers and the public, deepening the public’s understanding and appreciation of the research process as a whole [@zourou_citizen_2022]. This inclusive perspective, even if those participatory activities have been more widely used in natural sciences than in the humanities [@lowry_is_2021], underscores Citizen Science as an umbrella term encompassing both scientific and humanities endeavours, each enriched by the active participation of the public. While Citizen Science involves the public in research, they differ from crowdsourcing projects in several ways. Crowdsourcing typically involves the outsourcing of tasks to a large group of people, often through online platforms, with the aim of completing a specific task or project [@ridge_crowdsourcing_2017]. In contrast, Citizen Science focuses more on engagement and collaboration, with the goal of involving the public in the research process and generating new knowledge. That being said, there is also a convergence between Citizen Science with crowdsourcing projects. In many cases, Citizen Science initiatives may also involve crowdsourcing tasks, such as collecting or annotating data. Similarly, crowdsourcing projects may involve elements of Citizen Science, particularly when they aim to engage the public in scientific or CH research [@ridge_5_2021]. For instance, @haklay_citizen_2013 [pp. 115-116] distinguish four categories or levels of participation in Citizen Science projects, each serving as a rung on the ladder of public engagement. The levels are as follows: Level 1. Crowdsourcing: In this level, citizens act as sensors and volunteered computing resources. Level 2. Distributed intelligence: Here, citizens serve as basic interpreters and volunteered thinkers. Level 3. Participatory science: At this stage, citizens actively participate in problem definition and data collection. Level 4. Extreme Citizen Science: In the highest level, citizens engage in collaborative science that encompasses problem definition, data collection, and analysis. When applied in the context of Citizen Humanities, public participation takes diverse forms. This involvement can encompass activities such as the public’s engagement in archaeological finds recording, as demonstrated by the Finnish Archaeological Finds Recording Linked Open Database (SuALT) project [@wessman_citizen_2019]. Another illustration is the case of the Citizen-Led Urban Environmental Restoration project where ‘young citizen scientists [in Jamaica and the United States] worked closely with museum scientists to restore two environmentally degraded urban sites’ [@commock_connecting_2023]. In terms of crowdsourcing of CH data or more broadly in the humanities, @owens_digital_2013 [p. 121] discusses two primary challenges associated with integrating the concept. He highlights that both the terms and pose certain problems. Successful crowdsourcing initiatives in libraries, archives, and museums, as he notes, typically do not rely on extensive crowds, and they are far from resembling traditional labour outsourcing endeavours. Furthermore, Owens emphasises that the central focus of such initiatives is not on amassing large crowds but rather on cultivating engagement and participation among individuals in the public who have a genuine interest. As Citizen Humanities broadens its scope to encompass a wider public engagement in DH and CH research, successful collaborations between DH and relevant research infrastructures have shown promising results [@fiser_boost_2018; @simpson_zooniverse_2014]. Furthermore, the integration of scientific and curatorial knowledge plays a pivotal role in CH and humanities studies, uncovering previously unknown contextual information within original materials [@france_integrating_2014]. As illustrated by institutions like the National Library of Estonia, the shift towards human-centred approaches and the development of DH services exemplify the expansion of Citizen Humanities [@andresoo_hundred-year-old_2018]. Incorporating user-generated or user-enhanced metadata still presents several challenges [@raemy_applying_2021]. One major challenge is ensuring the quality and consistency of the data. Another challenge is managing the large volume of data generated by users. With increasing numbers of participants and contributions, it can become difficult to process and organise the data in a way that is useful for research and for the broader public. As @dahlgren_diversity_2020 argue: Participatory metadata production has been valued for its potential to reduce the workload of the heritage institutions and make possible speedier digitization. However, in practice, little of the resulting metadata has been reinserted into the institutions databases and used in-house by information specialists. This challenge is compounded by the fact that user-generated metadata may be unstructured, making it more difficult to analyse and interpret. To address these challenges, it can be helpful to have a robust data curation strategy, maintained by a team that can communicate with participants on a regular basis, as well as tools and technologies that enable efficient data processing and analysis. LOD can also be a useful approach for organising and linking diverse sources of information, enabling researchers to incorporate different perspectives and opinions into their analysis. This form of participation often involves micro-tasks, akin to ‘puzzle-like’ tasks, connecting users closely with the subjects they are describing [see @ridge_enriching_2023]. The dynamics of participatory projects are intriguing and multifaceted. As expressed by @dahlgren_diversity_2020: [It] is the often tightly curated top–down design of crowdsourcing platforms where participation is wide in terms of numbers of participants but small in terms of what those participants are allowed to do. The second involves the preconception that the crowd per se, because of its sheer size, in some ways represents a diversity of perspectives and experiences, an idea which is often put forward as one of the benefits of participatory metadata production. Five recommendations outlined by [@ridge_recommendations_2023], specifically geared towards the CH domain, can serve as valuable guidance for various participatory endeavours. These recommendations encompass: Infrastructure: ensuring platform sustainability by supporting existing tools alongside new developments; Evidencing and Evaluation: creating an evaluation toolkit to emphasise impact and wider benefits; Skills and Competencies: establishing a self-guided skills assessment tool and workshops for upskilling; CoP: funding international knowledge-sharing events like informal meetups, low-cost conferences, and peer review panels, ensuring inclusivity beyond limited regional funding projects; Incorporating Emergent Technologies and Methods: providing support for educational resources and workshops to anticipate the opportunities and implications of emerging technologies. These recommendations offer a versatile framework that can be applied to various participatory efforts, transcending the boundaries of specific domains and promoting a more inclusive and effective approach to public engagement in research and collaborative initiatives. By adhering to these measures, Citizen Science projects can better flourish, fostering a collaborative and proficient community of practitioners. Rather than creating new infrastructure, research projects should leverage and extend existing ones, such as Zooniverse[91], a generic Citizen Science portal and FromThePage[92], a transcribing platform. In summary, both Citizen Science and Citizen Humanities represent participatory methods of inquiry. While they have gained popularity, critical discussions regarding their potential limitations, notably in terms of diversity, are integral to their ongoing development. These critical discussions encompass issues like the challenge of addressing notions of volunteer — thus unpaid – labour, lack of diversity, and countering the dominance of traditional, often exclusive scientific practices [see @stengers_another_2018]. These conversations serve as essential drivers for the evolution of participatory approaches, prompting a reevaluation and refinement of their methodologies to ensure greater inclusivity and equity [@lewenstein_is_2022]. 3.3.2.3 FAIR Data Principles The FAIR data principles[93] were developed to ensure that three types of entities – namely data, metadata, as well as infrastructures – are Findable, Accessible, Interoperable, and Reusable. The four key principles of FAIR and their underlying 15 sub-elements or facets are as follows [@wilkinson_fair_2016]: F. Findable — (Meta)data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services. F1 (Meta)data are assigned a globally unique and PID F2 Data are described with rich metadata (defined by R1) F3 Metadata clearly and explicitly include the identifier of the data they describe F4 (Meta)data are registered or indexed in a searchable resource A. Accessible — Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorisation. A1 (Meta)data are retrievable by their identifier using a standardised communications protocol A1.1 The protocol is open, free, and universally implementable A1.2 The protocol allows for an authentication and authorisation procedure, where necessary A2 Metadata are accessible, even when the data are no longer available I. Interoperable — The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. I1 (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2 (Meta)data use vocabularies that follow FAIR principles I3 (Meta)data include qualified references to other (meta)data R. Reusable — The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. R1 (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1 (Meta)data are released with a clear and accessible data usage license R1.2 (Meta)data are associated with detailed provenance R1.3 (Meta)data meet domain-relevant community standards Originally introduced to improve data management and sharing in the life sciences, the FAIR principles have evolved into a widely adopted framework that transcends research disciplines. They have been adopted in a wide range of fields, including astronomy [@otoole_fair_2022], genomics [@corpas_fair_2018], environmental science [@crystal-ornelas_enabling_2022] and the humanities. In particular, FAIR principles have been applied to make historical archives, artworks or linguistic datasets more openly available for human users and search engines. Moreover, CHIs have embraced FAIR principles as guidelines and best practices, employing them in the deployment of repositories, virtual research environments or data platforms [@hahnel_how_2020; @beretta_challenge_2021]. Yet, the concept of FAIR data management practices in the humanities is not always straightforward as demonstrated by @gualandi_what_2022 at the University of Bologna in Italy. The study, involving 19 researchers from the Department of Classical Philology and Italian Studies were interviewed to investigate the concept of ‘data’ in the humanities, particularly in relation to the FAIR principles. The study identified 13 types of research data based on participant input, such as publications, primary sources (manuscripts, artworks), digital representations of CH resources, but also websites, events, or standards. Thus, suggesting that within FAIR, should encompass all inputs and outputs of humanities research. The research also emphasised the importance of methodologies and collaboration in managing research effectively, emphasising the need for clarity and consensus in applying FAIR data principles within the field. Indeed, implementing FAIR can be complex due to the variety of data types and existing practices. Such complexity requires structured methods. As [@jacobsen_fair_2020 p. 11] point out, FAIR is not a standard, it is a guide that needs implementations based on interpretations. Similarly, @dunning_are_2017 [pp. 187-188] emphasise the multifaceted nature of the FAIR Data Principles and the need to view compliance as an aspirational objective. Their research reveals challenges in achieving full compliance, with specific difficulty in the Interoperable and Re-usable facets. They advocate for basic policy implementation in areas like PID, metadata, licensing, and protocols, alongside transparent documentation. Additionally, the authors stress the importance of using HTTPS – an extension of HTTP (see 3.4.1) – to ensure secure data transmission and accessibility. Finally, the authors stress the importance of collaboration between (data) archivists and researchers. @go_fair_fairification_2016 – an initiative that aims to implement the FAIR data principles – outlined a seven-step FAIRification process, which includes essential stages in the transformation of data, as illustrated by Figure 3.11. These steps begin with the retrieval of non-FAIR data, followed by in-depth analysis to understand the content and structure of the data. The process then requires the creation of a semantic model that accurately defines the meaning and relationships of the data in a computational way, often involving the integration of existing ontologies and vocabularies. The fourth step involves linking data through the application of Semantic Web technologies, thereby improving interoperability and integration with disparate data sources. In addition, the assignment of a clear licence is highlighted as a separate step, emphasising its key role in enabling data reuse and open access. As a sixth step, metadata need to be assigned in order to support data discovery and access. Finally, the FAIRified data is deployed or published with its associated metadata and licence, ensuring that it can be accessed and discovered by search engines, even if authentication and authorisation requirements are in place. As a result, the new FAIR dataset can now be more conveniently aggregated with other data sources, making it more straightforward to raise research questions across multiple sources. Figure 3.11: The FAIRification Process. Adapted from @go_fair_fairification_2016 A further illustration of how FAIR can be deployed is the conceptualisation of the FDO, which includes a strong binding of various types of metadata [@schultes_fair_2019 pp. 7-9]. The members of the @european_commission_directorate_general_for_research_and_innovation_turning_2018 [p. 35] underline that the establishment of a FAIR-compliant ecosystem hinges on the FDO concept, an implementation framework to develop scalable cross-disciplinary capabilities. As illustrated in Figure 3.12, data must be assigned PIDs and accompanied by detailed metadata to ensure reliable discoverability, usability and citation. They also argue to use widely accepted file formats and adhering to community-specific metadata standards as well as vocabularies to support interoperability and reuse. Figure 3.12: The FDO Model [@european_commission_directorate_general_for_research_and_innovation_turning_2018] @soiland-reyes_updating_2022 highlight the potential of LOD to drive the adoption of FDO within research infrastructures. While this approach provides specifications and tools, the proliferation of standards and metadata vocabularies poses challenges to interoperability and implementation. To address these hurdles, the authors present the use of FAIR Signposting[94], which enables straightforward navigation to core FDO properties, without the need for complex content negotiation heuristics. In summary, the FAIR data principles comprise four key principles or 15 facets that provide a comprehensive framework for data management and sharing. While processes are in place to facilitate their implementation, the path to FAIRness can be complex, with interoperability and compliance challenges. A key element is the thoughtful mapping of different metadata standards and the strategic incorporation of Linked Data technologies. The FDO approach is equally relevant to the CH sector, supporting the preservation, accessibility and sharing of CH data and resources. The sharing of code, accompanied by comprehensive documentation, also enhances such an ecosystem by facilitating the exchange of valuable technical knowledge and resources. 3.3.2.4 CARE Principles for Indigenous Data Governance The CARE[95] Principles were developed to protect Indigenous data sovereignty [@carroll_care_2020] as complementary guidelines to FAIR. The principles are as follows: C. Collective Benefit: Data should be collected and used in a way that benefits the community as a whole, rather than just individuals or organisations. A. Authority to Control: Indigenous communities should have control over their own data, including how it is collected, stored, and used. R. Responsibility: Those who collect and use Indigenous data have a responsibility to ensure that it is used ethically and responsibly. E. Ethics: Indigenous data governance should be guided by ethical principles that reflect the values and beliefs of the community. The concept of adhering to the CARE principles is vital for promoting equitable data practices. CARE are built upon existing data reuse principles like FAIR but also integrate the efforts of Indigenous-led networks focused on Indigenous data governance and research control. While FAIR emphasise data accessibility, CARE go beyond that by considering actions aligned with the needs and intentions of individuals and communities connected to the data [@carroll_operationalizing_2021]. By embedding CARE-informed data practices into project design, the ethical and responsible use of Indigenous data can be enabled to improve inclusive policies and services [@robinson_caring_2021]. 3.3.2.5 Collections as Data In the same vein of FAIR and CARE should be mentioned the that originated from a meeting of GLAM practitioners in Vancouver, Canada in April 2023 that builds on the [96] done in 2017 [@padilla_always_2017]. The statement highlights the growing global engagement with collections as data. It promotes the responsible computational use of collections to empower memory, knowledge and data practitioners. It emphasises ethical concerns, openness and participatory design, as well as the need for transparent documentation and sustainable infrastructure. The statement, comprising of ten recommendations, also recognises the potential impact of data consumption by AI, and the importance of considering climate impacts and exploitative labour [@padilla_vancouver_2023]. More specifically, the following ten principles have been established for anyone with (meta)data stewardship responsibilities: Collections as Data development aims to encourage computational use of digitised and born digital collections. Collections as Data stewards are guided by ongoing ethical commitments. Collections as Data stewards aim to lower barriers to use. Collections as Data designed for everyone serve no one. Shared documentation helps others find a path to doing the work. Collections as Data should be made openly accessible by default, except in cases where ethical or legal obligations preclude it. Collections as Data development values interoperability. Collections as Data stewards work transparently in order to develop trustworthy, long-lived collections. Data as well as the data that describe those data are considered in scope. The development of collections as data is an ongoing process and does not necessarily conclude with a final version. In a final report, @padilla_collections_2023 underscore the transformative potential of the Collections as Data paradigm, particularly in the context of GLAMs. The principles and case studies highlighted in the report offer a roadmap for organisations to responsibly and ethically engage with their collections in the digital era. It is imperative to recognise that the journey towards fully realising its potential is ongoing and requires a commitment to continual evaluation and adaptation. This involves not only adhering to established principles but also being responsive to emerging technological trends, societal changes, and evolving ethical considerations. The role of AI in shaping the future of Collections as Data is particularly noteworthy. As AI continues to advance, it offers both opportunities and challenges in terms of enhancing access and insights into collections while also necessitating careful consideration of ethical implications, such as bias and privacy. Furthermore, the growing emphasis on climate impacts and sustainable practices in data stewardship is a crucial aspect that aligns with global efforts towards environmental responsibility. Building on the discussion of the principles and initiatives surrounding Collections as Data, an in-depth analysis was carried out to assess the compliance of repositories, projects and platforms from six organisations with the checklist, namely the British Library[97], the National Library of Scotland[98], LoC[99], the Royal Danish Library[100], Meemoo[101], and the Miguel de Cervantes Virtual Library[102] [@candelaChecklistPublishCollections2023a p. 13]. Although several institutions have opened access to their collections through APIs, such as IIIFs capabilities, challenges remain in fully embracing the Collections as Data principles. Barriers include resource limitations and the balance between making collections widely available through simplified access and downloads. In addition, different items within the checklist may require different levels of maturity and prioritisation, often requiring collaborative efforts. Initial results show that the checklist is a valuable tool for identifying relevant issues for individual institutions, although prioritisation may vary according to context and user needs. Collaborative initiatives between institutions are underway to improve the practical implementation and user experience, particularly in the structuring of datasets [@candelaChecklistPublishCollections2023a pp. 20-21]. While there are still relatively few examples of institutions that have fully adopted the Collections as Data principles, several case studies – such as at the Royal Library of Belgium, which is materialised through DATA-KBR-BE[103] [see @chambers_collections_2021] – and initiatives offer valuable insights. For instance, @candelaChecklistPublishCollections2023a [p. 7] outline a checklist tailored to GLAM institutions to publish Collections as Data[104]. They devised 11 criteria, including the provision of clear licensing for dataset reuse without restrictions, citation guidelines, comprehensive documentation, the use of public platforms, sharing examples of dataset use, structuring the data, providing machine-readable metadata, participation in collaborative edition platforms, offering API access to the repository, developing a dedicated portal page, and defining clear terms of use. These recommendations serve as a structured framework to enhance accessibility, usability, and interoperability, fostering engagement with cultural and historical collections. A further notable advancement that has been done in the area of publishing Collections as Data is the contribution of @alkemade_datasheets_2023. They have outlined a series of recommendations for developing datasheets or modular templates designed for CH datasets. This initiative holds significant importance for GLAMs, facilitating the structured organisation of their data, notably for seamless integration with ML tools where they propose to provide a description of how content have been influenced by digitisation. Their work highlights the need for documentation, focusing on tailored metrics, biases, and system integration. The proposed datasheets aim to detail the creation, selection, and digitisation processes, enhancing transparency and addressing the distinctive challenges of digital CH data. Emphasising a narrative approach to articulating biases, the author acknowledges the complex historical context and ethical implications. 3.4 Open Web Platform and Linked Data The web, created at the CERN in 1989 by Tim Berners-Lee[105], has enabled scholars and CH practitioners to access and analyse vast amounts of data in new ways, thereby opening the door to the creation of federated datasets and KGs. At the heart of this transformation are two pivotal concepts: the Open Web Platform and Linked Data. The Open Web Platform refers to a set of technologies and standards that allow for the creation and sharing of content on the web. Linked Data, on the other hand, refers to a set of principles and technologies that enable the publication and interlinking of data on the web, creating a web of data that can be easily navigated and used by humans and machines alike. Recognising the Web as an environment that supports a wide range of applications beyond traditional browser-based interactions is becoming increasingly important. Platforms such as social networks like Facebook, Twitter, Instagram, and Mastodon, streaming services such as Netflix and Disney+, as well as cloud-based applications, all leverage web technologies even when not accessed via a traditional web browser. These platforms are integral to the web ecosystem, highlighting the web’s role as a foundational platform for diverse digital interactions and data exchanges. In the context of the Internet, it is important to note that much of what we know today about it is the result of developments by many individuals and organisations. However, a significant milestone was the development of the TCP/IP protocol by Vinton Cerf and Robert E. Kahn in the 1970s [see @cerf_protocol_1974]. This protocol became the standard networking protocol on the ARPANET in 1983, marking the beginning of the modern Internet [@leiner_past_1997]. Understanding the differentiation between the Internet and the web is crucial. The former is a global network of interconnected computers that communicate using Internet protocols, forming the infrastructure that enables online communication. The web, or World Wide Web, is a service built on top of the Internet, leveraging HTTP to transmit data. While the Internet provides the underlying connectivity, the web offers a way to access and share information through websites and links. This differentiation is vital in comprehending how the web, as a part of the Internet, has evolved into a versatile and ubiquitous platform supporting a wide array of applications. This section, divided into five subsections, explores some of the key concepts underlying the Open Web Platform and Linked Data, and their applications in the CH field. First, 3.4.1 examines the foundational principles and technologies that underpin the Open Web Platform. This includes an overview of principles, protocols such as HTTP, and the use of URIs to identify resources on the web. This part also explores the different types of web architectures such as the client-server model or the concept of web services, which allow for the exchange of data and functionality across different applications and systems. 3.4.2 explores the vision of the web as a giant, interconnected database of structured data that can be queried and manipulated by machines. The subsection examines the technologies and standards that make up the Semantic Web, including RDF, RDFS, OWL, and SPARQL. Subsection 3.4.3 examines the set of principles designed to promote the publication and interlinking of data on the web. The subsection explores the four principles of Linked Data - using URIs to identify resources, using HTTP to retrieve resources, providing machine-readable data, and linking data to other data. Subsection 3.4.4 examines the set of criteria for publishing data on the web in a way that makes it easily discoverable, accessible, and usable. The subsection describes the Five-Star and Seven-Star deployment schemes, which include criteria such as providing data in a structured format, using open standards, and providing a machine-readable license. Finally, 3.4.5 explores the specific application of LOD in the CH domain. The subsection provides examples of how CHIs such as museums, libraries, and archives are using Linked Data to make their datasets more accessible and discoverable on the web. Overall, this section provides a comprehensive overview of the key concepts and technologies underlying the web as an open and linked platform, and their applications in the CH field and more broadly for any scientific endeavours as that the web started with [@nelson_d-lib_2022 citing @berners-lee_worldwideweb_1991]. Through exploring these concepts, we can gain a deeper understanding of how the web is evolving into a more open, interconnected, and data-driven platform, and how this evolution is transforming the way we access, use, and share information. 3.4.1 Web Architecture The web architecture has played a very important role in the development of scholarly research and CH practices, enabling new forms of collaboration, data sharing and interdisciplinary research. By providing a standardised and interoperable framework based on open standards for sharing and accessing data [@berners-lee_long_2010], it has facilitated the open exchange of information, even if citation, i.e. has always been an issue, particularly for scholarly outputs [@lagoze_web-based_2012 p. 2223]. Web architecture is a conceptual framework led by the W3C that underpins and sustain the World Wide Web [@jacobs_architecture_2004], created to be [@berners-lee_world-wide_1994]. It encompasses the architectural bases of identification, interaction, and format – also referred to as representation where HTTP provides the technical mechanisms for transmitting and accessing information. The web architecture is based on a set of identifiers, such as URIs, which are used to uniquely identify resources on the web. These identifiers play a crucial role in enabling users to find, access, and share information on the web, and they help to ensure that web-based systems are both user-friendly and interoperable. Here, it is valuable to distinguish between three key concepts: URI, URL, and URN, as a URI can be further classified as a locator, a name, or both [@berners-lee_uniform_2005 p. 7] – as shown in Figure 3.13. URI: It is the overarching term encompassing both URLs and URNs. It serves as a generic identifier for any resource on the web. URIs can be used to uniquely identify resources, regardless of the specific naming or addressing scheme employed. URL: It is a subset of URIs and refers to web addresses that specify not only the resource’s identity but also its location or how to access it. URLs often include the protocol (such as HTTP) and the resource’s specific location (e.g., a domain and path). URN: It is another subset of URIs that emphasise the resource’s identity rather than its location or how to access it. URNs are designed to be persistent and unique, making them suitable for resources that are intended to be recognised and referenced over time. While URLs may change as resources move or evolve, URNs should remain constant. Figure 3.13: Overlap and Difference between URI, URL, and URN Interaction between web agents, i.e. a person or a piece of software acting in the information space on behalf of a person, entity or process, over a network involves URIs, messages and data. Web protocols, such as HTTP, are message-based. Messages can contain data, resource metadata, message data, and even metadata about the metadata of the message, typically for integrity checking [@jacobs_architecture_2004]. The Web Architecture allows for multiple Representations of a Resource. In this context, a data format specification becomes pivotal, encapsulating an agreement on how to correctly interpret the representation of data, as articulated by [@jacobs_architecture_2004]: A data format specification embodies an agreement on the correct interpretation of representation data. The first data format used on the Web was Hypertext Markup Language (HTML). Since then, data formats have grown in number. Web architecture does not constrain which data formats content providers can use. This flexibility is important because there is constant evolution in applications, resulting in new data formats and refinements of existing formats. Although Web architecture allows for the deployment of new data formats, the creation and deployment of new formats (and agents able to handle them) is expensive. Thus, before inventing a new data format (or “meta” format such as XML), designers should carefully consider re-using one that is already available. Access can also be mediated by content negotiation, which is a mechanism employed in web communication to determine the most appropriate representation of a resource to be sent to a client based on the client’s preferences and the available representations [@lagoze_web-based_2012 pp. 2223-2224]. At its core, web architecture is based on a set of architectural principles that guide the design and development of web-based systems and applications. These principles include concepts such as orthogonality, extensibility, error handling, and protocol-based interoperability. Orthogonality allows the evolution of identification, interaction, and representation independently. Extensibility is key, enabling technology to adapt without compromising interoperability. Error handling addresses diverse errors, from predictable to unpredictable, ensuring seamless correction. Finally, the web’s protocol-based interoperability fosters communication across varied contexts, outlasting entities and facilitating the longevity of shared technology [@jacobs_architecture_2004]. Overall, these principles help to ensure that the web remains robust, reliable, and flexible. Web architectures can be categorised into several types, each offering a specific approach to designing and structuring web-based systems. Here, I will focus on the following three types of architectures – shown in Figure 3.14: the client-server model, the three-tier model, and SOA. Figure 3.14: Types of Web Architectures: Client-server Model, Three Tier Model, SOA The client-server model partitions the responsibilities between two key components: the client, which represents the user interface or user-facing part of the system, and the server, which is responsible for storing and serving data. In this model, clients and servers communicate to perform various functions, such as requesting and delivering information [@oluwatosin_client-server_2014 pp. 67-68]. The three-tier model is another significant web architecture that introduces an additional layer between the client and server, resulting in a three-part structure. This architecture is designed to further segregate and manage the system’s components [@wijegunaratne_three-tier_1998 pp. 41-42]. The three tiers typically consist of the presentation tier (the user interface), the application tier (responsible for logic and processing), and the data tier (where data storage and retrieval occur). SOA is a web architecture that emphasises the creation and utilisation of services as the central building blocks of a system. Services in this context are self-contained, modular units of functionality that can be accessed and used independently by various components of a web application. These services are designed to be loosely coupled, meaning they can interact with other services without a deep dependency on one another. Overall, ‘SOA is a paradigm for organizing and packaging units of functionality as distinct services, making them available across a network to be invoked via defined interfaces, and combining them into solutions to business problems.’ [@laskey_service_2009 p. 101]. SOA can encompass various communication protocols, such as REST, which is a prominent architectural style for designing networked applications [@fielding_architectural_2000], primarily leveraging HTTP. RESTful services, i.e. applications that complies with the REST constraints, are designed to work with existing capabilities rather than creating new standards, frameworks and technologies [@battle_bridging_2008 p. 62]. These services are built around a set of constraints, including statelessness, a uniform interface, resource-based identification, and the use of standard request methods such as GET, POST, PUT, and DELETE [@tilkov_brief_2017]. The following are all the specified request methods enabling clients to perform a wide range of operations on resources[106] [@fielding_http_2022 p. 72]: GET: Transfer a current representation of the target resource. HEAD: Same as GET, but only transfer the status line and header section. POST: Perform resource-specific processing on the request payload. PUT: Replace all current representations of the target resource with the request payload. DELETE: Remove all current representations of the target resource. CONNECT: Establish a tunnel to the server identified by the target resource. OPTIONS: Describe the communication options for the target resource. TRACE: Perform a message loop-back test along the path to the target resource. RESTful services, with their emphasis on using standardised HTTP methods and resource-based identification, offer a versatile means of designing web services and APIs. Their simplicity and compatibility with the web’s core protocols make them a practical choice for implementing various web-based applications. In the context of the Semantic Web, RESTful services can serve as a crucial component for accessing and exchanging graph data [see @lee_learning_2011]. 3.4.2 The Semantic Web The Semantic Web is [@berners-lee_semantic_2001 p. 35]. It was already in [@berners-lee_realising_1999]'s vision and prediction that the web, in its next phase, could be understood by machines, i.e. shifting from a traditional web of documents to a web of data. @bauer_linked_2012 [p. 25] articulates that ‘[t]he basic idea of a semantic web is to provide cost-efficient ways to publish information in distributed environments. To reduce costs when it comes to transferring information among systems, standards play the most crucial role.’. At the heart of the Semantic Web lies the foundation of RDF. The original RDF specification, known as the RDF Model and Syntax, serves as the underlying mechanism that establishes the basic framework of RDF. This framework provides the cornerstone to facilitate the exchange of data among automated processes [@lassila_resource_1999]. A fundamental component within RDF is the RDF triple as shown in Equation 3.1, comprising three essential elements: the subject (s), the predicate (p), and the object (o). In an RDF triple, the subject is the resource or entity about which a statement is made, the predicate is the relationship or property describing that statement, and the object is the value or resource associated with the statement. s → p o Equation 3.1: Triple Pattern Notation RDF statements are reminiscent of the semiotic triangle of @ogden_meaning_1930 [p. 11] — as illustrated in Figure 3.15 — where the referent is tantamount to the predicate of a triple. This analogy emphasises the intrinsic relationship between communication, representation and knowledge organisation. It highlights how both language and structured data rely on the establishment of connections and relationships to effectively convey meaning. Figure 3.15: The Semiotic Triangle by [@ogden_meaning_1930]. Figure 3.16 is an RDF graph about myself and where I was born leveraging mostly Schema.org[107], a collaborative project and Linked Data vocabulary used to create structured data markup on websites. This graph consists of vertices and edges, where vertices can be either URIs or literal values, and the edges represent relationships between them. In plain language, the graph asserts that there is a person represented by the URL https://www.example.org/julien-a-raemy, who has a given name ‘Julien Antoine’ and a family name ‘Raemy’. The person’s birthplace is specified as an URL from Wikidata, which is of type schema:Place. Additionally, there’s a statement indicating that the birthplace is named ‘Fribourg’. Figure 3.16: Example of an RDF Graph In the subject-predicate-object syntax of RDF, the subject can be either a URI or a blank node[108]. The predicate is an URI, like schema:givenName, and its aim is to establish connections between subjects and objects, describing the nature of the relationship. The object is either an URI, a blank node or a literal, such as or . Objects can also act as subjects if they are identifiable, allowing for the expansion and interconnection of RDF graphs. The original specification proved too broad, leading to confusion and a subsequent effort yielded an updated specification and new documents such as RDF/XML [@beckett_rdfxml_2004], which express an RDF graph as XML, a syntax specification recommendation in 2004 and later revised in 2014 as part of the RDF 1.1 document set [@gandon_rdf_2014], which also introduces the notion of an RDF dataset that can represent multiple graphs [@cyganiak_rdf_2014]. is the RDF/XML serialisation of the earlier graph. (…) @idehen_semantic_2017 highlights a significant concern regarding the earlier representations of the Semantic Web and how it is portrayed. These portrayals often place undue emphasis on the pivotal role of XML as an ostensibly obligatory component in Semantic Web development. To him, this historical perspective, particularly prominent around the year 2000, erroneously positioned XML as a superior alternative to HTML for constructing the Semantic Web. As illustrated by Figure 3.17, @idehen_semantic_2017’s revision embodies a Semantic Web layer cake that encompasses several technical or conceptual components. Smart Applications and Services: These systems are constructed declaratively, as opposed to using an imperative approach, with a flexible integration of data models, interaction, and visualisation. Trust: It is established through verifiable claims regarding identity, the source of content, and related issues. Proof: It provides a basis for building trust, such as leveraging authentication tiers. Transmission Security: It pertains to safeguarding data during its transit over networks. This protection is achieved by implementing established protocols, such as TLS, which includes inherent cryptographic support for ensuring the confidentiality and integrity of data as it travels across communication channels. Unifying Logic: In this component, FOL[109] assumes a central role by providing the fundamental schema for modelling and comprehending data. These propositions serve as the core building blocks for problem-solving and decision-making, offering a universal and abstract structure that can be applied across various domains and applications[110]. Rules: They serve as the foundation for conducting reasoning and inference, such as using SHACL, which is used to specify integrity constraints for data entry when constructing structured data with RDF. At this level, mapping languages such as R2RML and RML have a part to play. R2RML is a powerful language for expressing customised mappings from relational databases to RDF datasets [see @das_r2rml_2012]. On the other hand, RML extends these capabilities beyond relational databases, allowing for the transformation of various types of data sources, including CSV, XML, and JSON files, into RDF [see @dimou_rdf_2022]. Both languages are instrumental in bridging the gap between non-RDF data sources and the Semantic Web. Query: This can be achieved by employing SPARQL for retrieving and manipulating data on structured RDF statements. Dictionaries: Collections of formal definitions (also referred to as vocabularies or ontologies) that describe entity and entity relationship types. They provide a structured framework for defining and organising concepts, properties, and relationships, which aids in modelling knowledge in a systematic and machine-readable manner. Some of the commonly used dictionary languages include RDFS, which is a simple vocabulary language used to define basic schema information for RDF, SKOS designed for representing and organising some KOSs, particularly in the context of thesauri and taxonomies, as well as OWL. Abstract Language: It is done by leveraging the RDF syntax (as shown in Equation 3.1) as a basis. Sentence Part Identifiers: To identify resources on the web, IRIs[111] or URIs can be used. Document Types: Different serialisations of RDF exist within the Semantic Web, such as RDF/XML, Turtle or JSON-LD[112]. Semantic Web of Linked Data: The final component that holds the rest together; originating from Tim Berners-Lee’s vision (see ). Figure 3.17: Tweaked Semantic Web Technology Layer Cake by @idehen_semantic_2017 Following on from the components outlined in Figure 3.17, I will look in more detail at further RDF features, serialisations, and RDF-based standards for representing, querying or validating graphs. In doing so, I will touch on some considerations related to the inference and reasoning of RDF graphs. Code Snippet 3.5 is a Turtle serialisation of Figure 3.16. Turtle, a W3C standard, is a notation is a way to express this data in a structured and machine-readable format [@wood_linked_2014 p. 44]. It is a common syntax used for representing RDF data. It allows people to create statements in a more friendly manner than in an RDF/XML serialisation [@beckett_rdf_2014]. Here are some of the most important features: (…) 3.4.3 Linked Data Principles 3.4.4 Deployment Schemes for Open Data 3.4.5 Linked Open Data in the Cultural Heritage Domain (…) 3.5 Linked Open Usable Data (…) 3.6 Characterising Community Practices and Semantic Interoperability (…) 3.7 Summary and Preliminary Insights This section provides a summary of what has been discussed in this literature review as well as some preliminary insights with regard to the LOUD ecosystem, chiefly the design principles, communities, standards, and the implementations. It follows the flow of the present chapter and is organised into five subsequent parts. Finally, in , I end with a few reflections on why we ought to care about CH data in the wider sense. (…) 4. Exploring Relationships through an Actor-Network Theory Lens As Jim Clifford taught me, we need stories (and theories) that are just big enough to gather up the complexities and keep the edges open and greedy for surprising new and old connections. [@haraway_staying_2016 p. 101] This chapter serves as the theoretical framework of the dissertation, and its primary goals are to elucidate the theoretical underpinnings and provide a comprehensive toolbox for addressing the identified problem. In the preceding literature review chapter, I highlighted the issue that necessitates attention around interlinking CH. The theoretical framework, sometimes referred to as the ‘toolbox’, which can be likened to ‘tools’ the that will be employed to understand and address this problem. Here, the primary purpose of this chapter is to offer an in-depth exploration of the tools – which comprises various theories, propositions, and concepts – delineating their characteristics, behaviours, historical applications, interrelationships, relevance to the study’s objectives, and potential limitations. Subsequently, the next chapter will elucidate how these tools will be operationalised in the research process. The theoretical framework of this study is firmly rooted in ANT, which will be pursued systematically throughout the research. ANT is a constructivist approach that seeks to elucidate the fundamental dynamics of societies. Unlike traditional perspectives that restrict the concept of an ‘actor’ – or ‘actant’ – to individual humans, ANT expands this notion to encompass non-human and non-individual entities [see @callon_actor-network_1999; @callon_actor_2001; @latour_actor-network_1996; @latour_reassembling_2005]. ANT goes beyond the mere identification of actors and networks; it embodies a comprehensive methodology for exploring the intricate interplay of socio-technical systems. ANT distinguishes itself not only by recognising both human and non-human entities - from individuals and technological artefacts to organisations and standards - as actors (or actants), but also by examining their roles within heterogeneous networks of aligned interests. This approach facilitates a nuanced understanding of enrolment and translation processes, where diverse interests are aligned to form cohesive networks, and the concept of irreversibility, which describes the stabilisation of these networks over time. In addition, ANT introduces the concepts of black boxes and immutable mobiles, highlighting the persistent yet mobile nature of network elements such as software standards that transcend spatial and temporal boundaries [@lee_actor-network_1997 pp. 468-470]. These concepts are instrumental in dissecting the dynamics of IIIF and Linked Art specifications, which can be considered either full-fledged actors or immutable mobiles, depending on the context of the network under consideration. This dual perspective underscores ANT's role as both a theoretical lens and a methodological tool, providing a robust framework for dissecting the fabric of socio-technical assemblages and enriching our understanding of DH and CH interconnections. Additionally, the theoretical framework is enriched by integrating complementary perspectives from Donna Haraway’s SK, Susan Leigh Star’s BO, and Luciano Floridi’s PI. Each of these frameworks contributes uniquely to our understanding of LOUD and its socio-technical landscape. Haraway’s approach emphasises the contextually-embedded nature of knowledge, underscoring the importance of diverse perspectives in shaping our understanding of technological phenomena [see @haraway_situated_1988]. The concept of BO provides a framework for examining the role of LOUD technologies as mediators among varied groups, highlighting the importance of flexibility and adaptability in technological systems [see @star_institutional_1989; @star_this_2010]. Meanwhile, PI offers a foundational perspective, viewing information as an intrinsic part of the reality that shapes and is shaped by technologies [see @floridi_philosophy_2011]. Collectively, these theories complement the ANT-based approach by providing a multi-faceted understanding of the complexities inherent in LOUD, its technologies, and the communities involved. This ANT-grounded toolbox is composed of three elements: Demonstrating how non-human entities exert agency. Identifying the human and non-human actors involved in these processes. Investigating the concept of translation and the process by which a network can be represented by a single entity. For example, when considering the design of the PIA data model, a full-fledged actor in its own right, and more broadly for any KR system, pertinent questions arise about the influences exerted by the various groups of individuals involved in the process. These questions concern not only their interactions with each other but also their impact on the manifestation of the model and, consequently, how KRs can influence the various actors, both during its implementation and throughout its creation. In addition, the data model is composed of parts always bigger than the sum of their individual characteristics, as each part not only contributes to the overall functionality but also embodies a complex network of relationships and interactions. This viewpoint, inspired by @latour_whole_2012, asserts that in the realm of social connections, following Gabriel Tarde’s monadological approach – i.e., viewing each individual or element as a self-contained universe or ‘monad’ with its own unique properties and relations [see @tarde_social_2000] – individual elements (such as stakeholders or data points) often carry more information and potential than what is apparent when they are viewed solely as components of a larger system. In this perspective, the complexity and richness of each individual element often surpasses the aggregate. Thus, the model, potentially acting as a boundary object when not aligned with standardised processes, serves as a site of negotiation and alignment among different stakeholders. As @haraway_staying_2016 [p. 104] poetically puts it, software could also be defined as ‘imploded entities, dense material semiotic “things”’, a notion that underscores the entanglement of information, technology, and materiality. This chapter is organised into four sections, corresponding to the three aforementioned aspects and an additional section focusing on the revised epistemological foundations. First, Section 4.1 explores the dissolution of rigid distinctions between human and non-human actors, emphasising the dynamic and interdependent nature of such relationships. This exploration is fundamental in understanding the broader implications of standards and community engagements in any field. Then, Section 4.2 examines how collectives composed of differing actors can be assembled into a cohesive network where each entity’s agency and influence are recognised. In this section, the concepts of quasi-object of BO are introduced to elucidate the role of shared objects and concepts in mediating and facilitating interactions among diverse groups within a network. Section 4.3 investigates the translation process where actors negotiate, modify, influence, and align their interests and identities in the formation and maintenance of networks. Additionally, PI, particularly the SLMS approach, is introduced here to provide a structured understanding of how information and knowledge are conceptualised, managed, and communicated within these networks, offering a deeper insight into the dynamics of standard adoption and community interaction in collaborative efforts. Finally, Section 4.4 revises the epistemological foundations to address the nuanced inquiries presented later in the empirical chapters. In this section, SK is introduced, emphasising the importance of context-specific and perspective-driven knowledge in shaping our understanding of technological and cultural phenomena. This concept, developed by Donna Haraway, advocates for a more critical and reflexive view of knowledge, recognising that all knowledge is situated within specific cultural, historical, and personal contexts. This approach challenges the notion of objective or universal knowledge, asserting that all understanding is partial, located, and contingent. The incorporation of SK is crucial for comprehending how different actors’ perspectives and experiences influence the implementation and interpretation of LOUD standards. It helps in examining how these standards and community participation in IIIF and Linked Art are perceived and enacted differently across various social fabrics, particularly contrasting settings where these standards and communities are not engaged. Central to this discussion is the question, This question serves as a common thread, leading into the main research question: ‘How to situate Linked Open Usable Data and to what extent has LOUD shaped or will shape the perception of Linked Data in the broader context of cultural heritage and digital humanities’. Throughout these sections, ANT forms the underlying common thread, with the other theories augmenting and enriching this comprehensive theoretical framework. Overall, the theoretical framework will be drawn upon to explore what @manovich_cultural_2017 [pp. 60-61] refers to as ‘everything and everybody’. Borrowing from Haraway’s concept of Tentacular Thinking, this approach recognises the interconnectedness and interdependence of all elements within the research scope, from the minutiae of technical details to the broader societal implications [see @haraway_tentacular_2016]. This comprehensive view is essential for addressing the detailed nuances of technical implementations, yet it is also crucial for understanding their wider societal implications, and for considering the multi-layered complexities involved in the implementation and perception of LOUD. 4.1 Implosion of the Boundaries: Non-humans have Agency The @oxford_english_dictionary_agency_2023 assigns two primary sets of meanings to the term ‘agency’. The first pertains to ‘a person or organisation acting on behalf of another, or providing a particular service’. The second, of greater relevance to this discussion, relates to an ‘action, capacity to act or exert power’. This second definition is further elaborated as an ‘action or intervention producing a particular effect; means, instrumentality, mediation’. The concept of agency has been a central theme in various philosophical and sociological discourses. In ancient philosophy, both Plato and Aristotle contributed foundational ideas to the concept of agency, each offering distinct perspectives that have significantly influenced subsequent thought. Plato, known for his theory of ideal forms, presented a dualistic view of reality, distinguishing between the world of forms (ideas) and the physical world. Within this framework, he saw agency as the soul’s ability to recognise and conform to these ideal forms [@watson_free_1975 p. 209]. For Plato, true agency involved transcending the physical and sensory world and directing one’s actions according to reason and intellect. This pursuit of knowledge and truth was seen as the highest form of agency, with actions aimed at realising eternal and immutable truths. Plato’s vision of agency is closely tied to knowledge, virtue and the pursuit of the good, as seen in his portrayal of the philosopher-king in , who governs himself and the state with wisdom and insight [see @plato_republic_360bce]. Aristotle, Plato’s student, offered a more practical and empirical approach. He incorporated agency into his broader ethical framework, placing emphasis on the ability to act virtuously and to make decisions in accordance with a telos or purpose. Aristotle regarded every action and choice as directed towards an end, and this teleological approach is key to understanding his concept of agency. Agency in Aristotle’s philosophy is deeply intertwined with the notions of potentiality and actuality, where potentiality represents inherent capabilities and actuality is their realisation through action. This perspective reinforces the importance of rational deliberation, moral virtue, and the realisation of potentiality in human life. In addition, Aristotle emphasised the role of choice (prohairesis) and practical wisdom (phronesis) in guiding deliberate, rational and virtuous action [see @charles_aristotle_2017]. These ancient philosophical perspectives, with Plato’s focus on reason and alignment with the ideal, and Aristotle’s emphasis on practical wisdom and virtue, set the stage for later philosophical explorations of agency by modern thinkers such as David Hume (1711-1776), Immanuel Kant (1724-1804), and Georg Wilhelm Friedrich Hegel (1770-1831). In their perspectives, agency is understood as an individual’s capacity to act in the world, based on intentionality and rationality [@pippin_idealism_1991]. Hume’s empiricist approach saw agency as closely tied to the experiences and perceptions of the individual, underlining the role of personal choices and mental states [see @schier_hume_1986]. Kant’s critical philosophy stressed the importance of autonomy and moral law in agency, which he understood as the capacity of individuals to act according to universal moral principles derived from reason. Hegel offered a more dialectical approach, seeing agency as part of a broader historical and social process in which the actions of the individual are interwoven with the unfolding of rational will in history. These classical views of agency focus primarily on human agents and their conscious, intentional actions. In contrast to these traditional perspectives, the advent of ANT and the works of Bruno Latour, John Law, Madeleine Akrich, and Michel Callon, mark a significant shift. These academics argue for a more inclusive understanding of agency, where non-human entities — ranging from technological artefacts to animals and even ideas — can also be agents that influence and shape the course of social events. This perspective is a departure from the anthropocentric view of agency and opens up new ways of understanding social dynamics and networks. The concept of the , as introduced by Algirdas Julien Greimas (1917-1992), is pivotal in this context. In Greimas’ semiotic theory, an actant can be any entity, human or non-human, that contributes to the progress of a narrative. This concept significantly expands the traditional narrative framework established by Vladimir Propp (1895-1970), which focused mainly on human characters and their roles in folk tales. Propp’s analysis focused on the actions and roles of these character types, which he categorised into a standardised framework. Greimas’ approach to narratives, as referenced by @boullier_medialab_2018, emphasises the potential of any entity to play a role in a story, thereby expanding the narrative scope. The agency’s move is based on a well-known but seldom mentioned loan from Greimas’ 1966 semiotics. The concept of “actant” allowed the potential arrangement of any entity that populated the narratives to be aligned beyond Propp’s tradition. While Greimas’ formalism was certainly not preserved, the principle allowed for more open stories to be told and the concept of “allies” to be formalised, in particular, which extended the idea of “adjuvants” and “opponents” (without this being done from a strategic perspective, contrary to some interpretations). [@boullier_medialab_2018] This idea of actant resonates strongly within ANT, as it aligns with the theory’s aim to dissolve the strict dichotomy between human and non-human actors. In ANT, actants are not limited to individuals or even sentient beings; they include any entity that can affect or be affected by the network. This redefinition of agency through the lens of ANT and the concept of the actant is a cornerstone in understanding the complex, interconnected networks that constitute social and technological realms. It allows for a broader and more nuanced understanding of how various elements within a network interact and influence one another, regardless of their traditional classification as human or non-human. ANT provides a radical redefinition of agency, challenging the modernist and post-modernist interpretations. It proposes an viewpoint [@latour_postmodern_1990], dissolving the dichotomy between human and non-human agency, and focusing on the network dynamics in a society where ‘contemporary techno-science consist of intersections or “hybrids” of the human subject, language, and the external world of things, and these hybrids are as real as their constituent’ [@bolter_remediation_1999 p. 57]. Agency, in ANT, includes not just intentional actions but the capacity of any entity to affect or be affected in a relational network [@latour_actor-network_1996]. This expansive view of agency, influenced by the concept of the actant and informed by Greimas’ semiotics, offers a more holistic understanding of actors within networks. Adopting a Latourian approach, researchers such as anthropologists and sociologists are encouraged to observe the balance between human and non-human properties within networks. This balanced observation is crucial for a full understanding of the dynamics within these structures [@latour_we_1993 p. 96]. By acknowledging the agency of both human and non-human agents, and by recognising the blurred boundaries between subjects and objects, researchers can gain deeper insights into the complex interplay of forces that shape social realities. 4.2 Assembling the Collective Following the exploration of agency in the previous section, the assembly of collectives is examined. This process involves identifying a myriad of actors, both human and non-human, and understanding how they coalesce into actor-networks. The assembly of such a collective, a differing cosmos of socio-technical agents, is predicated on the recognition of each actor’s unique agency and the dynamic interplay of relationships that bind them together. The transformation challenges of tools for DH and object knowledge, as discussed by @camus_digital_2013, highlight the complexities involved in integrating DH with traditional scholarly practices. The author emphasises the need for a nuanced approach to the digitisation and dissemination of CH resources, underscoring the pivotal role of collaborative efforts in bridging the gap between technology and humanities scholarship. This analysis aligns with the ANT perspective, which advocates for recognising the contributions of diverse actors within the wider DH ecosystem. Within the LOUD ecosystem, a diverse set of actors comprises individual contributors, institutions, and several groups and committees, each with its own set of objectives. This ensemble also includes specifications and compatible software that facilitate interoperability, as well as end users. Interestingly, the majority of these end users remain unaware — whether through seamless integration or simply because their interaction does not require conscious recognition — that their digital interactions are often mediated by or compliant with LOUD standards. This diversity underscores the importance of understanding how different actors, their objectives, and their contributions shape the development and adoption of LOUD specifications and practices. Transitioning from the depiction of the LOUD ecosystem’s varied participants, the concept put forth by @gandon_web_2019 for an envisioned web architecture introduces a composed of diverse natural intelligences – such as humans, connected animals and plants – and artificial intelligences, including entities capable of reasoning and learning. This shift marks a deeper recognition of the layered interactions that form the backbone of digital platforms, and points to a future where technology adapts to embrace a wider range of intelligences within the structure of the web. Classifying these diverse actors involves understanding their roles and interactions within the network. Quasi-objects and BOs provide frameworks for this classification, enabling a nuanced understanding of the socio-technical assemblage and facilitating communication among its varied components. The foundation established by ANT, incorporating the concept of from @serres_parasite_2014 challenges traditional categorisations by embodying characteristics of both subjects and objects. This conceptual framework is crucial for appreciating how non-human entities can exhibit agency and actively participate in social networks, thus broadening our understanding of actor-network dynamics. Quasi-objects, existing in a state of flux and embodying characteristics of both subjects and objects, challenge our conventional understanding of agency. Concurrently, the concept of BOs is introduced, enriching the ANT-grounded toolbox by highlighting the role of shared objects and concepts in mediating and facilitating interactions among diverse groups within the network. Unlike quasi-objects, which symbolise a hybrid state between subjectivity and objectivity, BOs focus on interaction and communication. They are crucial in collaborative efforts, especially in diverse and interdisciplinary settings, by maintaining a common identity across various contexts while being interpreted differently in each. Understanding the distinction and the interplay between quasi-objects and BOs is vital for comprehending the dynamics of actor-networks. The introduction of BO in this section elucidates their role in mediating complex socio-technical interactions, highlighting the importance of BO in community-based initiatives and their broader impact. Star’s reflection on BOs underscores their significance: Boundary objects are objects which are both plastic enough to adapt to local needs and constraints of the several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use, and become strongly structured in individual-site use. They may be abstract or concrete. They have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable, a means of translation. The creation and management of boundary objects is key in developing and maintaining coherence across intersecting social worlds [@star_ethnography_1999 p. 393] The relevance of BO extends to the restructuring of residual categories through cycles of standardisation attempts that create said boundary objects as illustrated in Figure 4.1. This cycle emphasises the negotiation and alignment among different stakeholders, underscoring the adaptive and flexible nature of BO in managing the complexities of standardisation and the varied interpretations across social worlds [see @star_this_2010]. Figure 4.1: Relationships Between Residual Categories, BOs, and Standardisation Attempts. Adapted from @star_this_2010 The integration of concepts such as quasi-objects and BOs within the ANT-based toolbox necessitates a reevaluation of existing ontological frameworks. This shift leads to a more interconnected understanding of societal dynamics, recognising the central role of diverse actors in forming networks. The adoption of these concepts in ANT requires a re-evaluation of existing ontologies. This re-evaluation involves redefining our understanding of agency, action and influence to include a wide range of actors, both human and non-human. This approach leads to a more nuanced and interconnected understanding of society, where traditional boundaries between different types of actors are actively re-imagined. In this way, ANT argues for a network-like ontology and social theory that can fully integrate the influence and interactions of disparate actors within society [@latour_actor-network_1996 p. 370]. In engaging with the complexities of assembling the collective, we encounter the necessity to cease replicating the , as discussed by @derrida_mal_2008. This principle, which dictates the preservation and accumulation of knowledge under a single authority or location, is challenged by the fluid and distributed nature of ANT's networks. Derrida’s critique invites a rethinking of how knowledge and information are curated and disseminated, echoing the ANT perspective that stresses the distributed, multifaceted interactions of actors within a network. The dialogue with Derrida’s deconstruction of the archive complements the ANT approach by advocating a more open, inclusive understanding of how socio-technical collectives are formed and maintained. For instance, the examination of BOs in the context of queer identities illustrates the potential of these concepts to challenge and redefine traditional typologies, offering new perspectives on identity and community formation within socio-technical networks [see @junginger_categorizing_2021]. Having assembled our collective and identified the diverse actor-networks, the focus shifts to exploring the relationships and communication mechanisms among them. This exploration, conducted in the subsequent section, is crucial for understanding how different actors – ranging from quasi-objects to boundary objects – contribute to and shape the collective narrative and functionality of a given network. 4.3 The Translation Process The translation process within ANT refers to the dynamics of establishing associations and networks among diverse actors. Latour describes translation as a specialised relation that transforms mediators into coexistent entities without directly transferring causality. This concept underscores the complexity of interactions within networks, emphasising the creation and maintenance of associations that extend beyond mere causal relationships. [The] word “translation” now takes on a somewhat specialised meaning: a relation that does not transport causality but induces two mediators into coexisting [@latour_reassembling_2005 p. 119] Associations within ANT evolve continuously, demonstrating the ongoing and emergent nature of actor-networks. They materialise through the interactions and negotiations among actors. This perspective broadens the understanding of networks by stressing their fluid nature. Understanding the dynamics of actor-networks requires an exploration of the ways in which actors influence each other. This is especially important in order to capture the nuanced roles and interactions within these networks. A key contribution in this area has been made by Latour, who suggests a compelling structure for disentangling these interactions as ‘mediation’. Latour’s conceptualisation is particularly insightful in that it breaks down the nature of influence into distinct yet interrelated processes. The essence of his argument can be summarised as follows: This capacity of actors to influence each other was defined by @latour_reassembling_2005 as mediation, further broken down into four types: interference, composition, black boxing, and delegation. Interference appears when one actor interferes with the goal of another. In composition, the actors influence the common goal of the network together. Black-boxing is when gradual complexification of actors (and their interrelations) reaches a point where treating the constellation as a single actor becomes more meaningful, and delegation is when meaning and expression is delegated to non-human objects. [@czahajda_live_2022 p. 3] In the context of the LOUD space, treating a constellation of actors necessitates a focus on delegation, a process pivotal for understanding how meanings and functions are assigned within networks. This emphasis on delegation underscores the necessity of expanding our epistemological horizons to encompass PI. Integrating PI offers a comprehensive framework for analysing how information not only mediates relationships within these networks but also influences the reality of digital ecosystems. Such an expanded perspective is essential for a thorough understanding of the dynamics within the LOUD ecosystem and its broader implications. PI is conceived as a groundbreaking approach that examines the nature and dynamics of information. It explores how information fundamentally structures reality and thereby shapes the entities within it, termed or informational organisms. These entities are embedded in information environments where they engage and interact in a vast information ecosystem. @floridi_information_2010’s work illuminates the ways in which information underpins and transforms our understanding of reality, and suggests that living in the information age means recognising our role and identity as part of a complex, interconnected informational world. This perspective invites a deeper reflection on the implications of living in the midst of vast information networks, and urges a re-evaluation of how information influences human identity, society and our wider interaction with the digital and natural worlds. Diving deeper into Floridi’s PI, the LoA concept emerges as a critical tool for dissecting the intricacies of complex computational systems, including LOUD-compliant ecosystems. This approach, fundamental to PI [@ganascia_abstraction_2015], aids in navigating the multifaceted layers and perspectives inherent in these systems [@angius_philosophy_2021], particularly those structured around client-server architectures. LoA provide a structured way to analyse and understand complex systems by breaking them down into different layers or perspectives. Each level focuses on specific aspects of the system, allowing for a clearer analysis of its components and their interactions. By advocating for a separation of concerns, the LoA framework equips us with a strategic method to manage and simplify complexity, enabling a focused examination of distinct abstraction levels within digital ecosystems [@van_leeuwen_floridis_2014]. Building on the foundation laid by Floridi’s LoA, @selbst_fairness_2019 critique the fair ML field’s reliance on abstraction and modular design for achieving fairness, identifying five abstraction traps that highlight the challenges of applying computational interventions to societal contexts without considering the interplay between social context and technical systems. This critique underscores the relevance of incorporating a socio-technical perspective in this thesis, emphasising the need to make use of STS methodologies in the design process to avoid these abstraction traps. Considering the insights from @selbst_fairness_2019, this thesis will explore four levels of LoA and one transversal dimension within LOUD ecosystems, acknowledging that each can act as its own actor-network or collectively form a singular network. These levels, from low to higher abstraction, include Algorithmic and Computational Processes, Infrastructure, Data Model, and Representation and Display. Societal implications, integrated across all levels, will address the broader cultural and social impacts. Algorithmic and Computational Processes: This level explores the specific algorithms, such as image conversion scripts or sorting algorithms, and computational frameworks leveraged within the LOUD ecosystem. For example, analysing how a recommendation algorithm influences the accessibility of digital resources. Infrastructure: It focuses on the servers, cloud-based services, and micro-services that support the ecosystem. This includes assessing the deployment of server architectures that facilitate scalable data storage and retrieval, like using Amazon Web service or GitHub Pages for hosting LOUD data. Data Model: It covers the organisation and structuring of data, including metadata standards and LOUD specifications. An example would be examining how CIDOC-CRM is used in Linked Art. Representation and Display: This level not only encompasses the JSON-LD outputs for effective data sharing and interlinking but also focuses on how users see and interact with LOUD representations across different platforms – browsers, viewers, and players. It examines the GUI that make digital collections accessible and engaging. For example, how a virtual gallery allows users to explore a digital exhibit. In the exploration of LOUD ecosystems, the concepts of and , as delineated by @bolter_remediation_1999, provide additional insightful perspectives on the role of interfaces as LoA. Immediacy refers to the design of interfaces that aim to create a seamless, transparent UX, making the technology invisible and allowing for direct interaction with the content. This is evident in interfaces such as OSD, which displays high-resolution images and strives to provide a smooth and immersive viewing experience by minimising the perceptibility of the API-compliant resource. On the other hand, hypermediacy emphasises the presence and visibility of the medium, drawing attention to the various forms of mediation. Interfaces embodying hypermediacy offer a multi-layered, heterogeneous presentation, making users aware of the different media elements and their interactions, e.g. Exhibit for storytelling purposes. This duality enriches the user’s digital encounter, underscoring the need for LOUD-compliant tools and services to mediate these experiences effectively. By embracing these concepts, the framework can strategically leverage interfaces to either conceal or reveal the intricacies of the digital medium, facilitating a nuanced engagement with information ecosystems. Figure 4.2 presents a comprehensive view of the LOUD ecosystem’s LoA, enriched by the inclusion of societal implications as a cross-cutting dimension and the incorporation of immediacy and hypermediacy as critical concepts at the representation and display level for understanding user interaction and interface design. Figure 4.2: Exploring Levels of Abstraction in LOUD Ecosystems: Integrating Societal Implications with the Concepts of Immediacy and Hypermediacy The SLMS scheme, which includes a framework for each identified LoA, equips this research with a comprehensive lens for analysing computational systems, revealing how they can be effectively combined with ANT. This methodology, as depicted in Figure 4.3, enriches the exploration of digital ecosystems. This combination offers a unique perspective on understanding the interplay between computational entities and the broader networks they inhabit. The SLMS scheme can be summarised as follows: System: This refers to any entity or collection of entities that can be studied. It could be anything from a physical object, a biological entity, to a social or computational system. LoA: Floridi emphasises the importance of levels of abstraction in understanding and analysing systems. A level of abstraction is a way of observing a system, focusing on certain aspects while ignoring others. It’s a conceptual framework or lens through which we can understand complex systems. Model: At each level of abstraction, we create models of the system. These models are simplifications or representations of the system that highlight certain features while omitting others, based on the chosen level of abstraction. Structure: This refers to the organisation or arrangement of the components within the system, as understood or represented at a given level of abstraction. Figure 4.3: The SLMS According to [@floridi_method_2008] @gobbo_what_2016 expand on the SLMS scheme by highlighting the challenges and intricacies of quantifying and qualifying computational information. They advocate for a comprehensive methodology that appreciates both the physical and conceptual dimensions of data, facilitating a deeper understanding of programmable artefacts and their informational content. This perspective not only complements the analytical capabilities of ANT but also opens new avenues for investigating the dynamics of information and technology. As I venture to revise the epistemological foundations and introduce Haraway’s concept of SK, it becomes increasingly manifest that the integration of ANT with Floridi’s PI and the insights of computational information theory provides a robust framework for exploring the complexities of digital and networked environments. This interdisciplinary approach lays the groundwork for a comprehensive exploration of the digital world, emphasising the importance of situated, contextual knowledge in understanding and navigating the digital landscape. 4.4 Epistemological Foundations This section establishes the epistemological foundations, presenting ANT, BO, and PI, alongside Donna Haraway’s SK. Rather than synthesising these theories, this chapter places an emphasis on situating LOUD within a feminist perspective to construct a new materialistic foundation reminiscent of Haraway’s [@haraway_staying_2016 p. 42]. This assemblage seeks to navigate the controversies and mappings within LOUD-like communities, applying a Tardian approach to trace the spreadability of ideas @latour_whole_2012. To analyse the relevant actor-networks effectively, especially being part of both the IIIF and Linked Art communities, a particular lens is required. ANT, while expansive, has faced criticism for its perceived flatness in analysing networks. Here, Haraway’s SK becomes instrumental, providing a stance that enriches the ANT-grounded theoretical framework with a comprehensive lens that prioritises context in shaping knowledge. SK emphasises that knowledge is always situated, partial, and contextually produced, offering a critical perspective on determining relevance within networks. SK complements ANT by adding depth to the analysis of actor-networks. It highlights the significance of context – both human and non-human – in the production of knowledge, thereby enriching the theoretical framework with a nuanced lens for exploring the dynamics within the LOUD space. SK, as articulated by @haraway_situated_1988, emphasises the contextual nature of knowledge and challenges the pursuit of an objective, universal truth divorced from the position of the knower. Haraway’s framework, which integrates standpoint theory, concedes that knowledge is inherently shaped by its social, cultural and historical context, and supports an understanding of knowledge as partial and situated. This approach, noting the influence of epistemological privilege and intersectionality, argues against universalism by stressing the importance of being conscious of the specific perspectives and biases that inform one’s understanding. The relevance of SK to ANT lies in its complementary perspective of acknowledging the diverse, context-specific factors that influence knowledge production within networks, enriching ANT's analysis of actor-networks by incorporating a critical, reflexive lens on the situatedness of knowledge. In forging this theoretical framework, I seek to transcend the notion of merely disparate ideas. Instead, I aim to weave their contributions into a coherent web of thought, ensuring a seamless and comprehensive framework that embodies the essence of their respective insights, one that transcends disciplinary boundaries. The theoretical framework can be synthesised as follows, elegantly interweaving distinct yet complementary perspectives to enrich our understanding of socio-technical ecosystems: SK, by advocating for an understanding of knowledge as inherently partial and situated, complements ANT by adding depth to the analysis of actor-networks. It enriches the theoretical framework with a nuanced lens for exploring the dynamics within the LOUD space. BO further refine this framework by offering a means to characterise actors and mediate interactions within the network, facilitating connectivity and translation among diverse groups. PI enriches ANT by framing information as a fundamental element in actor-network interactions. It offers insights into the different LoA, or different actor-networks, each with a different perspective or granularity, and how information is created, shared, and used, and how these processes influence the relationships and dynamics within networks. Figure 4.4 illustrates how these theories intertwine to form the epistemological foundation of this research, demonstrating the synergistic potential of combining ANT with SK, BO, and PI to navigate the complexities of digital ecosystems in the CH field. Figure 4.4: A Sympoiesis of Theories: ANT Entangled with SK, BO, and PI Concluding this chapter, the developed toolbox lays a coherent foundation for empirical research, poised to explore the dynamics within actor-networks. This exploration is not entirely novel in the context of CH, as exemplified by @guillem_faire_2023’s use of ANT to spotlight the keystones that were destroyed by the fire at Notre-Dame de Paris. As this narrative unfolds into , the ANT-based toolbox, enriched with amodern and feminist perspectives, will be instrumental in navigating the forthcoming empirical landscapes. 5. Research Scope and Methodology This chapter delineates the Research Scope and Methodology, laying the groundwork for the empirical exploration within this thesis. (…) 6. The Social Fabrics of IIIF and Linked Art (…) 7. PIA as a Laboratory (…) 8. Yale’s LUX and LOUD Consistency (…) 9. Discussion [Il] faut renoncer à l’idée d’une interopérabilité syntaxique ou structurelle par l’utilisation d’un modèle unique, qu’il s’agisse de la production, de stockage ou de l’exploitation au sein même d’un [système d’information]. [@poupeau_reflexions_2018] [113] This chapter presents a comprehensive discussion where I interpret, analyse and critically examine my findings in relation to the thesis and the wider application of LOUD. Through an in-depth analysis of the design principles of LOUD and their implications for CH, this discussion aims to demonstrate the many challenges and opportunities inherent in this framework. The focus is on achieving community-driven consensus, rather than simply pursuing technological breakthrough. The following sections are organised to provide a comprehensive review of the empirical findings, an evaluation abstracting LOUD, and a retrospective analysis of the research journey. Firstly, in Section 9.1, I will present a summary of the empirical findings from my research. This will include key themes and insights, structured to reflect the different areas of study and practice within LOUD. Secondly, in Section 9.2 I will provide an evaluation of LOUD by means of using the LoA approach. This evaluation will focus on the impact of LOUD on the perception of Linked Data within the CH domain and the wider DH field. This will include the key themes and insights that have emerged, structured in a way that reflects four levels of abstraction. I will also explore the dual nature of LOUD implementation, involving both simplicity and complexity, and discuss the various factors that influence such dynamics. Finally, in Section 9.3, I will offer a retrospective analysis of the research journey. This section will interpret the findings to situate LOUD as fully-fledged actors. It will reflect on the challenges, achievements, and lessons learned throughout the research process, providing a holistic view of the project’s trajectory and its implications for the future of LOUD. 9.1 Empirical Findings This section summarises the empirical findings of my research and already offers some suggestions. The structure does not follow the exact order of the three empirical chapters but is organised around overarching topics that emerged throughout the study. The seven topics include Community Practices and Standards, Inclusion and Marginalised Groups, Maintenance and Community Engagement, Interoperability and Usability, Future Directions and Sustainability, Digital Materiality and Representation, as well as Challenges of Scaling and Implementation. Community Practices and Standards GitHub serves as a vital hub for community involvement, with a core group of active contributors often attending meetings regularly. This platform simplifies decision-making within the community, although it also reflects biases similar to those in FLOSS communities. Behind visible activities like meetings, there is substantial preparatory work managed by co-chairs, editorial boards, or driven by community-generated use cases. This foundational work often determines the direction and outcomes of formal gatherings. The LUX project at Yale, as seen in , has successfully fostered collaboration across various units, bringing together libraries and museums on a unified platform. The technological foundation of LUX, based on open standards, facilitates data integration and cross-collections discovery. Not only does the deployment of FLOSS tools contribute to these achievements, but it also emphasises the social advantages of working collaboratively. The concept of the Tragedy of the Commons, as described by @hardin_tragedy_1968, highlights the potential for individual self-interest to deplete shared resources. However, @ostrom_governing_1990 offers a counterpoint by demonstrating how communities can successfully manage common resources through collective action and shared norms. In this context, initiatives like the CHAOSS initiative[114] play a significant role by providing metrics that help evaluate the health and sustainability of open source communities. These metrics include contributions, issue resolution times, and community growth, offering valuable insights into how collaborative efforts can be maintained and improved. Reaching consensus is another critical aspect of community practices and standards. While the minutes of meetings are valuable artefacts, they often reflect an Anglo-Saxon approach to decision-making characterised by few substantive points and critical turning points. The formal aspects of conversations captured in minutes do not fully encompass the decision-making process, which frequently involves informal conversations, consensus-building through open dialogue, and subtle cues that influence outcomes. These elements are integral to the English and American approach and hold valuable lessons for an international community. IIIF and Linked Art are international communities, but decisions are made in English and the majority of participants are based in North America and the UK, significantly imprinting this approach. Understanding these nuances can help us improve our collaborative efforts within the IIIF and Linked Art communities. By recognising and appreciating these different facets of decision-making, we can learn from each other and enhance our collective ability to make effective and inclusive decisions. Some of the challenges associated with these practices include the major demand on resources for community building, the slowness inherent in distributed development, and the difficulty in achieving consensus. Additionally, the concept of social sustainability can be seen as an imaginary construct that papers over differences, as discussed by @fitzpatrick_generous_2019. Addressing these challenges is crucial for the long-term success and effectiveness of the IIIF and Linked Art communities. Inclusion and Marginalised Groups The demographic homogeneity in these communities can perpetuate biases and neglect issues relevant to underrepresented or marginalised groups, as seen in . Participation in these standardisation processes is itself a privilege. The assumption that internet access and digital devices are universally available is critically examined, revealing key actors in the digital landscape. This mirrors issues within the IIIF community, where generating IIIF resources presupposes means that may not be accessible to all. We need clear terms of inclusion, as highlighted by @hoffmann_terms_2021. She argues that effective inclusion requires a critical examination of the frameworks and conditions under which inclusion is offered. The framework should ensure that inclusion initiatives do not merely add diversity to existing power structures but work to transform these structures fundamentally. This involves questioning who defines the terms of inclusion, who benefits from them, and who may be inadvertently excluded. @hoffmann_terms_2021 suggests a participatory approach, where marginalised communities are actively involved in shaping inclusion policies and practices, thus making inclusion an ongoing, reflective process rather than a static goal. The inclusion of marginalised groups is a necessary step, but it is not sufficient. To truly make a difference, there must be a strategic and concentrated effort to appropriate technologies, as emphasised by [@morales_apropiacion_2009; @morales_imaginacion_2017; @morales_apropiacion_2018] and further articulated by [@martinez_demarco_empowering_2019; @martinez_demarco_digital_2023]. This strategic approach highlights the political significance of challenging dominant neoliberal and consumerist perspectives on technology and individual engagement. @martinez_demarco_digital_2023 underscores the critical importance of focusing on practices that go beyond mere inclusion. Instead, it requires a deep understanding and critical assessment of how technology is intertwined with social, economic, and ideological contexts. It implies a reflective and deliberate process of technology adoption in which individuals creatively tailor technology to their specific needs, beliefs, and interests. Moreover, a key aspect highlighted by @martinez_demarco_digital_2023 is the implicit and explicit critique of a universalist approach to inclusion, which often lends itself to all too easy instrumentalisation. Understanding and studying resistance to inclusion in an oppressive digital transformation context is paramount, particularly given the highly unequal conditions that prevail. In this light, a comprehensive study of socio-material and symbolic processes, practices, and involved in embedding technologies into individuals’ lives is needed. This approach also recognises technology as a catalyst for change. It envisions the use of technology to drive meaningful change at multiple dimensions and realities—national, societal, or personal. By focusing on these practices, empowering individuals to navigate and use technology thoughtfully and purposefully becomes a reality, bridging the gap between technological advances and societal progress [@martinez_demarco_empowering_2019]. Maintenance and Community Engagement The tension between creating advanced specifications and their practical implementation by platforms is evident in the IIIF Cookbook recipes and Linked Art patterns, as discussed in Chapter 6. This ongoing development shows that the community is still finding the best ways to achieve broad adoption and interoperability. The deployment of the Change Discovery API, as illustrated in Chapter 7, demonstrates that establishing such a protocol on top of the IIIF Presentation API is feasible and straightforward. High-level support from leadership, particularly Susan Gibbons as Vice Provost, has been crucial in building trust and ensuring the project’s success as a valuable discovery layer at Yale. This integration of diverse collections through a unified platform, based on open standards, highlights the potential for transforming teaching, learning, and research by leveraging collaborative efforts. The topic modelling exercise in LUX reveals the intricate actor-networks composed of organisations, individuals, and non-human actors. This analysis underscores the importance of ongoing processes and relationships in maintaining and evolving infrastructure, akin to the concept of ‘infrastructuring’. As detailed in Chapter 8, following best practices and guidelines such as the SHARED Principles is essential for better involvement, but it is also crucial to uphold these commitments consistently over the long term to ensure meaningful participation. Between the PIA team members, there were sometimes ‘disconnects between different communities who undertake collaborative research’ [@vienni-baptista_foundations_2023]. This was something we had to navigate and learn from, which was manageable within the context of a laboratory setting. However, for any follow-up projects or whatever forms the digital infrastructure we built may take, it is imperative that these disconnects are addressed and solidified to ensure cohesive and sustained community engagement. Interoperability and Usability Within PIA, different APIs have been progressively deployed to meet various requirements while allowing parallel exploration of data modelling. Each API offers unique advantages, but their collective integration promotes semantic interoperability. For example, the IIIF Image API has been instrumental in rationalising image distribution across prototypes, providing efficient access to high-quality digital surrogates and the ability to resize them for different uses. Adherence to LOUD standards and schemas within LUX has generally been positive, although transitioning between versions of a specification can present challenges, highlighting the need to improve the consistency of compliant resources. Linked Art, for instance, has the capacity to generate various insights and sources of truth around different entities. However, additional or entirely new vocabularies from sources like the Getty may need to be used – such as Homosaurus. Complementary to Linked Art, using WADM allows for assertions that go beyond purely descriptive narratives, though it may sacrifice some semantic richness. This complexity in managing vocabularies and maintaining semantic richness directly ties into broader usability considerations within the community. Addressing these usability concerns, Robert Sanderson has suggested focusing on the use of full URIs in Linked Art to ensure computational usability, in contrast to IIIF‘s approach of minimising URIs to enhance readability. This difference highlights a fundamental question in usability: balancing readability and computational usability. Understanding developers’ perspectives on these approaches is critical. I would suggest as a way forward for the IIIF and Linked Art communities to focus on further improving usability of the specifications. This includes conducting comprehensive usability assessments of APIs to evaluate the experiences of new developers versus existing ones, understanding the steepness of the learning curve associated with each API, and guiding improvements in documentation, on-boarding processes, and overall developer support. Efforts should be made to lower the barriers to entry for new developers by developing more intuitive and user-friendly tutorials, providing example projects, and creating a robust support community. Ensuring that developers can quickly and effectively leverage APIs will foster greater adoption. Addressing the challenges of transitioning between different versions of specifications is critical, and developing tools and guidelines that help maintain consistency across versions will reduce friction and ensure smoother updates. Future Directions and Sustainability Survey findings, as discussed in , underscore the need for ongoing efforts to develop LOUD standards that foster an inclusive, dynamic digital ecosystem. Future strategies should include creating educational resources and frameworks that support interdisciplinary collaboration and reduce barriers to participation. While the Manifest serves as the fundamental unit within IIIF, the Linked Art protocol can play a similar central role as semantic gateways in broader contexts, allowing round-tripping across the APIs. The topic modelling exercise in LUX, detailed in , reveals complex actor-networks of organisations, individuals, and non-human actors, providing insights into the relationships sustaining the LUX initiative. The next steps for Linked Art might involve forming a new consortium independent of a CIDOC Working Group, which could provide the necessary support to sustain the initiative. Alternatively, integrating Linked Art into IIIF as a new TSG and specification could address the discovery challenges within IIIF, as discussed during the birds of a feather session led by Robert Sanderson [see @raemy_notes_2024] at the 2024 IIIF Conference in Los Angeles[115]. Design principles that act as bridges across different disciplines, as proposed by @roke_pragmatic_2022, are crucial. IIIF has demonstrated that this collaborative approach is feasible, and Linked Art could follow in its footsteps. However, achieving this requires increased dedication from passive members and broader adoption of the model and the API ecosystem in the near future. Digital Materiality and Representation As explored in Chapter 7, the detailed digital representation of photographic albums, such as the Kreis Family Collection, demonstrates the need to comprehensively capture the materiality of digital objects. This includes the structure and context of images, which are crucial for maintaining their historical and social significance. The implementation of the IIIF Presentation API in creating a detailed digital replica of the Getty’s Bayard Album shows how digital materiality can be enhanced through thoughtful use of technology, but also highlights the scalability challenges for such detailed representations. Creating these detailed digital representations can be seen as a ‘boutique’ approach, which, while labour-intensive and resource-demanding, is necessary for preserving the integrity and contextual significance of cultural heritage objects. The challenge lies in developing the appropriate means and methodologies to achieve this level of detail consistently. Future endeavours, whether through research projects or collaborative efforts between GLAM institutions and DH practitioners, should aim to address these challenges and create sustainable practices for digital materiality and representation. As Edwards aptly notes: ‘Presentational forms equally reflect specific intent in the use and value of the photographs they embed, to the extent that the objects that embed photographs are in many cases meaningless without their photographs; for instance, empty frames or albums. These objects are only invigorated when they are again in conjunction with the images with which they have a symbiotic relationship, for display functions not only make the thing itself visible but make it more visible in certain ways‘. [@edwards_photographs_2004 p. 11] Challenges of Scaling and Implementation As seen in Chapter 6, the IIIF Cookbook recipes and Linked Art patterns reflect the tension between creating advanced specifications and their practical implementation. This gap between ideation and real-world application underscores the challenges faced by the community in achieving broad adoption and interoperability. In Chapter 7, the exploration of APIs like the IIIF Change Discovery API illustrates the practical challenges and potential of scaling these technologies for wider adoption. The successful implementation in PIA demonstrates viability, but also points to the need for continued development and community engagement to fully realise the benefits. Furthermore, assessing the scalability of IIIF image servers, as discussed by [@duin_webassembly_2022] and exemplified by the firm Q42 with their Edge-based service Micrio[116], highlights the importance of optimising data performance. Erwin Verbruggen aptly noted that ‘optimising data performance in my opinion mens sending as little data over as needed’[117], emphasising the need for efficient data handling to enhance scalability. This insight reinforces the necessity of continual refinement in scaling digital infrastructure to support broader use and integration. Reflecting on these findings, I would like to assert that continuous participation, particularly for institutions that can afford to be part of initiatives like IIIF-C, is essential. Active members should not only focus on their own use cases but also consider the needs and perspectives of other, perhaps marginalised, groups. Achieving the dual goals of making progress within one community, whether it be IIIF or Linked Art, while also engaging in effective outreach and creating a solid baseline, will benefit everyone in the CH sector and beyond. Addressing where LOUD fits in, how people perceive this new concept or paradigm, and understanding how LOUD differs from Linked Data in general are essential. These questions help to clarify the stages at which themes related to one of the LOUD design principles emerge, crystallise, and potentially disappear. My thesis does not fully resolve these queries but offers insights and hints for further exploration. In conclusion, the empirical findings reveal the richness of the implementation and maintenance of LOUD standards in the CH domain. From the critical role of community practices and standards to the challenges of achieving interoperability and inclusivity, each theme underlines the complex interplay of social, technical and organisational factors. will look at the evaluation of LOUD and explore its overall impact, delving into the delta of what to do with it, particularly in terms of Linked Data versus LOUD, where my thesis provides pointers but does not provide definitive answers. 9.2 Evaluation: Abstracting LOUD In this section I will assess the impact of LOUD within the CH domain and the wider DH field, examining its implications for community practices and semantic interoperability, and secondarily whether LOUD has affected the perception of Linked Data. Referring to Figure 4.2, the following is a descriptive attempt to provide levels of abstraction of LOUD based on my empirical findings, focusing particularly on the deployment of IIIF within PIA and Linked Art within the LUX framework, aside from the data model abstraction level. Representation and Display: For PIA, the implementation of Leaflet provided an immediate and easy-to-integrate viewer to display high-resolution digitised images of CAS photographs. The context is accessible through accompanying metadata and related links on the GUI. Although not LOUD-driven per se, it functions as a mediator through the IIIF Image API. Balancing between immediacy and hypermediacy, the Mirador instance enabled the display of IIIF Presentation API resources with machine-generated annotations. We also incorporated the V3 plug-in to manipulate images[118]. However, we failed to provide a robust authentication layer allowing users to add their own annotations easily, highlighting the limitations of a four-year research project not primarily aimed at tool development but at proposing a participatory system. IIIF-compliant software can aid in this, yet development needs to be community-driven rather than individualised. Exhibit was the only tool used for educational and teaching purposes that was well-received, though integration issues persisted. LUX exemplifies Linked Art hypermediacy, where the structure of the JSON-LD representation drives their GUI, including URL syntax[119]. For both PIA and LUX, JSON-LD serves as an interface for certain users (software developers, data curators, data scientists). While resources can become BOs depending on the viewer, a few inconsistencies can still be overcome and will likely be understood by humans reading the files. Data Model: The data model of IIIF is primarily driven by its design principles and WADM. Also the main unit is the Manifest, often a digitisation or representation of a physical object, meaning the Presentation API is key to achieve an acceptable level of interoperability. The Shared Canvas Data Model is still somewhere, baked into the specifications, but one does not really need to know about this model to understand how IIIF works from Version 2 of the main APIs onwards, it is a piece of history though. IIIF is Linked Data, but has no real semantics value and should really not be treated as RDF triples. One could almost say the same about Linked Art, as it is not necessary to fully understand CIDOC-CRM to either start using its model or to deploy an API endpoint. However, some basics do need to be understood, such as the event-based model viewpoint, the important classes and their rdfs:domain and rdfs:range. However, Linked Art has bent some rules and created some properties and classes to meet the needs of the community. As far as implementation goes, I would suggest to directly implement and be consistent with the Linked Art API endpoints, rather than starting with the data model, as I see cross-institutional interoperability through interfaces as a more important milestone than data modelling as a pastime for the few specialists. For both PIA and LUX, dedicated data models were done to be consistent with the specifications, with some internal structure and data in LUX for their own purposes. Semantic data in PIA, as already realised through the Omeka A JSON-LD API was not done beyond templating of a few Linked Art resources and the workflow done with the University of Oxford, but there is not a productive PIA Linked Art API at the moment. Infrastructure: Serialisation mock-ups and JSON-LD templates on GitHub were the starting point to model IIIF Manifests and Collections for the PIA research project. Laravel and then Omeka A were the two main elements, in two different iterations, that were leveraged to present the IIIF resources. If single image Manifests were quite easy to serialised, the integration of more detailed representations of photographic albums presented challenges. A more robust infrastructure is definitely needed for the long-run, but efficient enough in a laboratory setting. Algorithmic and Computational Processes: PIA relied on virtual machines and had the necessary Kakadu licence embedded in our SIPI instance to encode the images. If the former proved difficult as performance was sometimes an issue, the latter was a good option as serving JPEG2000 images cannot currently rely on FLOSS solutions which are too slow. The LUX pipeline and the use of MarkLogic as a multi-modal database are examples of the data engineering expertise and outsourcing solutions required for such a platform. Some open source solutions, such as QLever[120], a high-performance SPARQL engine, may also offer some hope to institutions that are not well-funded and need robust knowledge graph-oriented solutions. The dual simplicity and complexity of implementing LOUD specifications and participating in community-led efforts can be attributed to the need for a reorientation of research projects. It is essential for these projects to actively engage in community processes rather than intermittently presenting their progress and subsequently withdrawing. This ongoing engagement fosters a more robust and collaborative environment, ultimately contributing to the advancement of shared goals and standards. Such a reorientation necessitates a fundamental change in how universities and GLAMs institutions operate, extending their involvement beyond the immediate project scope to ensure sustained participation and impact. Despite the introduction of LOUD, the perception of Linked Data has not evolved significantly. Most software engineers continue to treat resources primarily as JSON, often overlooking the graph structure that underpins Linked Data. For IIIF, this approach is appropriate given its focus on content interoperability and presentation. However, for Linked Art, overlooking the graph structure could be problematic to some extent, as it limits the full realisation of the semantic relationships and rich interconnections that Linked Data can provide. This highlights the need for more focused efforts to integrate semantic web principles, particularly in contexts where these principles can significantly improve the quality of data. I have faced challenges in moving many of the models developed within PIA into (beta) production, and the usability requirements of APIs have scarcely been addressed. However, the findings from this thesis should be viewed as starting points rather than conclusive solutions. The unseen aspect of this dissertation is my active involvement in both communities and my attempts to reciprocate this engagement within PIA. Each investigation presented could have warranted a dedicated thesis, indicating the breadth and depth of the topics explored. Ultimately, this work merely scratches the surface of numerous subjects, laying the groundwork for future research and development. The next section will offer a retrospective on the work accomplished during this PhD thesis. It will reflect on the various milestones achieved, the lessons learned, and the potential directions for future research. 9.3 Retrospective: Truding like an Ant In this retrospective[121], I will offer an analysis of the research journey. This section will interpret the findings to situate LOUD as fully-fledged actors within the CH field. It will reflect on the challenges, achievements, and lessons learned throughout the research process, providing a holistic view of the project’s trajectory and its implications for the future of LOUD. The empirical findings of my research reveal the nuanced interplay between socio-technical practices and implementations, synthesising insights through both thematic and abstract lenses. This dual approach underscores the importance of fostering collaboration and effective decision-making, while addressing biases and promoting inclusivity. The need for ongoing maintenance, interoperability and usability remains paramount, as does the development of educational resources and consortia to sustain initiatives. In addition, capturing digital materiality and addressing scalability challenges are critical to the widespread integration of LOUD standards. These findings lay the groundwork for future research and development aimed at bridging operational applications with more extensive design approaches. How can LOUD be situated as fully-fledged actors within the CH field? Reflecting on the notion of , frequently mentioned during the 2024 IIIF Conference, LOUD specifications embody this concept perfectly. Even if not all embedded patterns of a given API-compliant resource are correctly interpreted or rendered by a client, some of its basic features should still be displayed. This flexibility is crucial for ensuring the broad usability and adaptability of LOUD standards, allowing them to transcend institutional boundaries and serve as robust mediums of knowledge transfer. To paraphrase @poupeau_reflexions_2018’s quote at the beginning of this chapter, there isn’t a unique model for interoperability, but there are definitely best sociotechnical practices to be learned from IIIF and Linked Art. The act of participation prevails over the relatively easy and one-off deployment of specifications for the short term. By using LOUD, CH data can be effectively interlinked with different datasets, resulting in numerous potential benefits. An overriding benefit is the improved discoverability and accessibility of CH resources, facilitating enhanced search and retrieval capabilities. In addition, the adoption of LOUD promotes seamless data sharing and reuse within academic and memory institutions, fostering a culture of collaboration and interdisciplinary knowledge exchange. This approach not only enhances the overall utility and comprehensiveness of CH repositories, but also promotes collective understanding and appreciation of diverse cultural assets and historical narratives. However, it is essential to critically evaluate the application of LOUD in the context of CH data. While LOUD offers promising prospects for improved data interlinking and accessibility, challenges and concerns persist. The transition to LOUD principles necessitates significant investments in resources, including infrastructure, expertise, and time, which may pose barriers for smaller institutions or those with limited funding. Moreover, ensuring the accuracy, consistency, and quality of Linked Data is a complex task, demanding meticulous attention to detail and ongoing maintenance efforts. Furthermore, potential issues related to data ownership, rights management, and the potential misuse or misinterpretation of interconnected data should be carefully considered. Standardisation across different CH domains, each with unique data structures, formats, and contexts, may present formidable obstacles to seamless integration. These concerns underscore the need for a nuanced and cautious approach to the implementation of LOUD standards, taking into account the complexity and specificity of CH data and its diverse custodians. This thesis has been a journey of discovering Linked Art and a confirmation that the ethos of IIIF is yet to be fully manifested beyond product implementation. The sense of belonging to a community is an ongoing endeavour, much like the ants in Latour’s metaphor. This dissertation underscores that active participation in community processes is essential to achieving the dual goals of advancing the technological framework for semantic interoperability and fostering an inclusive and collaborative CH ecosystem. 10. Conclusion For a better understanding of the past, Our images have to be enhanced, A new dialogue in three dimensions, Must have openness at its heart, For somewhere within the archive Of our aggregated minds Are a multitude of questions And a multitude of answers, Simply awaiting to be found. [@mr_gee_day_2023] This chapter brings to a close the journey undertaken since February 2021, aiming to clearly articulate the answers to the research questions, discuss how the research aligns with the objectives, elucidate the significance of the work, outline its shortcomings, and suggest avenues for future research. I had the privilege of hearing the above poem at EuropeanaTech in The Hague in October 2023. What struck me most, and what I have tried to convey in this thesis, was the powerful dialogue and collective spirit striving to harness the potential of our (digital) heritage. With a sense of conviction after this conference, I approached the next one in Geneva in February 2024 with confidence, believing that I had made a compelling case for the concept of LOUD. When a participant asked how LOUD differed from Linked Data, however, I found myself explaining the socio-technical ethos of IIIF and Linked Art, the richness of the individuals who make them up, the ability to combine these different standards, and the common use cases that emerge from these collaborations. Whether my answer was convincing remains uncertain, but I knew it was too brief. Perhaps it is here, in this conclusion, that my thoughts can find their full expression. I believe that LOUD should be at the forefront of efforts to improve the accessibility and usability of CH data, an endeavour that is increasingly relevant in a web-centric environment. This paradigm has gained considerable traction, particularly with the advent of Linked Art and the recognition that the IIIF Presentation API has been an inspiration for the LOUD design principles. The development and maintenance of LOUD standards by dedicated communities are characterised by collaboration, consensus building, and transparency. In the interstices of the IIIF and Linked Art communities, frameworks for interoperability are not only exposed, but revealed as profound testaments to the power of transparent collaboration across institutional boundaries. Both communities, it is true, are still very much Anglo-Saxon efforts, where the specifications have mainly been implemented in GLAM and/or DH research projects, or at least when we have been aware of them. It has clear guidelines on how to propose use cases, mostly using GitHub, and hides the sometimes unnecessary RDF complexity behind a set of JSON-LD @ context. IIIF is at the presentation layer and can really play its role as a mediator, with the Manifest as its central unit connecting other specifications, including semantic metadata, and preferably with simpatico specifications such as Linked Art. An important hypothesis arises from the observation that adherence to the LOUD design principles makes specifications more likely to be adopted. The primary benefit of adopting LOUD standards lies in their grassroots nature. This grassroots approach not only aligns with the core values of openness and collaboration within the DH community but also serves as a common denominator between DH practitioners and CHIs. This unique alignment fosters a sense of shared purpose and common ground. However, it’s essential to acknowledge that while LOUD and its associated standards, including IIIF, hold immense promise, their limited recognition in the wider socio-technical ecosystem may currently hinder their full potential impact beyond the CH domain. Consideration of socio-technical requirements and the promotion of digital equity are essential to the development of specifications in line with the LOUD design principles. In the context of the IIIF and Linked Art communities, this means both recognising current challenges and building on existing practices. This includes forming alliances that support diverse forms of inclusion at both project and individual levels. For example, organisations should be encouraged to send representatives from diverse professional and personal backgrounds, such as underrepresented groups or non-technical fields. This can be facilitated by initiatives that lower the barriers to participation, such as financial support for travel and participation, flexible participation formats, and targeted outreach efforts. Furthermore, as these standards often align with open government data initiatives, they present opportunities for broader public engagement and institutional transparency. In the broader context of DH, understanding LOUD involves tracing the historical development of the field and its evolving relationship with technology. The interdisciplinary nature of DH has always integrated diverse scholarly and technical practices. In recent years, DH has seen a notable increase in interest in the use of Linked Data and semantic technologies to improve the discoverability and accessibility of CH collections. LOUD's emphasis on user-centred design and usability aligns well with these goals. Consequently, the principles of LOUD hold great promise for advancing the integration and use of community-driven APIs and/or Linked Data within DH. This can be seen within PIA, where the benefits of implementing IIIF helped us to streamline machine-generated annotations, integrate different thumbnails into GUI prototypes, model photo albums with different layers from the Kreis Family collection, and enable project members and students to engage in digital storytelling, an important participatory facet that can be seamlessly explored by DH efforts and CHIs with the help of the IIIF Image and Presentation APIs. Data reuse is definitely a key LOUD driver, which could have been done more extensively with a productive instance of Linked Art. As for widening participation, this is definitely a strategic and political decision, rather than a technical one. That said, LOUD specifications can definitely be embedded through strategic citizen science initiatives. A recent example that highlights the comprehensive value of Linked Data was presented by @newbury_linked_2024 at the CNI Spring 2024 Meeting. He delineated its significance as extending well beyond single entities, such as the Getty Research Institute, to enrich a vast ecosystem. Specifically, he identified three principal areas of value: Firstly, within the ecosystem itself, where the utility of information is amplified through its application in diverse contexts. Secondly, for the audience, by directly addressing user needs and facilitating various conceptual frameworks. And finally, within the community, by enabling wider use and adaptation of data and code. This approach to Linked Data, as articulated by Newbury, not only enhances its utility across these dimensions, but also aligns seamlessly with the LOUD proposition, underscoring a shared vision for a digital space where the interconnectedness and accessibility of (meta)data serve as foundational principles for progress and community engagement. LUX, as a catalyst for LOUD, exemplifies a practical approach to implementing Linked Data that has garnered significant local engagement and support at Yale. This initiative demonstrates how sound socio-technical practices can be effectively applied within a supportive institutional environment. The consistency of the data within LUX aligns well with IIIF and Linked Art standards, with only a few minor adjustments required for full compliance. These quick fixes are manageable and do not detract from the overall robustness of the initiative. While it may be too early to fully assess the wider impact of using LOUD specifications on the LUX platform within the CH domain, the initiative has already attracted considerable interest in recent months. This growing attention suggests that the LUX approach is resonating with other organisations, suggesting the potential for wider adoption and impact. The enthusiastic local engagement at Yale provides a strong foundation for LUX and highlights its potential to serve as a model for similar projects aimed at enriching digital heritage through effective collaboration and agreed-upon standards. In carrying out this thesis, I have adhered to the five main objectives set out at the beginning of the PhD. These objectives have been accomplished to a high degree, reflecting a substantial and well-executed project. Furthermore, most of the outputs – such as data models and scripts – from this work are available on GitHub, providing open access to the wider community. In addition, I have published several papers, both individually and collaboratively, further disseminating the findings and contributions of this research. Additionally, this thesis is relevant because it sheds light on communities and implementations that can be celebrated not only for their standards but also for their operating ethos; IIIF and Linked Art present models ripe for emulation beyond their immediate digital confines. Here, agency and authority are most typically granted to the collective over the isolated, with each actor - be it an individual, an institution or an interface – intricately interconnected. Yale’s LUX initiative also embodies this ethos, demonstrating how collaborative efforts can lead to innovative solutions and wider impact. It is to be hoped, then, that these practices of openness and multiple partnerships will not be seen as limited to their origins in digital representation. At the very least, I hope that these socio-technical approaches can serve as exemplars or sources of inspiration in broader arenas, where the principles of mutual visibility and concerted action can point the way towards cohesive and adaptive collaborative architectures. Despite its contribution, this thesis is far from perfect and certainly contains several shortcomings. I will name here three significant ones. First, the visualisations included and the use of FOL are primarily designed to support my own self-reflection and may be more beneficial to me than to the broader academic community. While they provide insights into my research process and findings, their applicability and usefulness to others might be limited. Second, the theoretical framework I employed, while instrumental to my research, may not serve as a universally applicable toolbox. Nevertheless, I urge readers to pay close attention to STS methodologies and practices. The works of Bruno Latour, Donna Haraway, and Susan Leigh Star have been invaluable companions throughout this dissertation. Additionally, for those involved in conceptualising semantic information, I recommend exploring Floridi’s PI, which offers profound insights into the nature and dynamics of information. These readings have greatly influenced my approach and understanding, and I believe they can offer valuable perspectives to others as well. Third, while the thesis aims to address both community practices and semantic interoperability, it leans more heavily towards the former. This emphasis on community practices may overshadow the broader discussion of semantic interoperability, potentially limiting the appeal of the thesis to those primarily interested in the technical aspects. Other shortcomings include the broad scope of the thesis, with three empirical chapters exploring different avenues. While this comprehensive approach provides a broad understanding of the research topic, it has also resulted in a rather lengthy thesis. This may be a challenge for the reader, as a topic of interest in one chapter may not be as compelling in another. The diversity of empirical focus, while enriching the research, may dilute the coherence for some readers, making it more difficult to maintain a consistent engagement throughout the dissertation. Despite these limitations, I hope that the different perspectives and findings contribute to a richer, more nuanced understanding of LOUD for CH. Avenues for future research are numerous and promising. One interesting area to explore is the comparative benefits experienced by early adopters of IIIF and Linked Art specifications versus those who implemented these standards later. Early adopters have the advantage of having their use cases discussed and resolved within the community, and it would be insightful to analyse the long-term impacts on their projects. Such a study is already feasible for early adopters of IIIF and will become possible to compare further implementations of Linked Art within a few years. Furthermore, future exploration could focus on the full implementation of Linked Art within PIA or similar efforts, as well as more performance-oriented testing with the deployed LOUD APIs. These efforts should further validate the robustness and scalability of these specifications. Another important area for future investigation is the participation of institutions and individuals from the Global South in both the IIIF and Linked Art communities. It is crucial to explore how we can better support their uptake of these specifications and encourage their active involvement in these initiatives to ensure a more inclusive and globally representative environment. As I reflect on the journey of this thesis, I am reminded of the powerful dialogue and collective effort that has been at its heart. Mr Gee’s poem resonates deeply with my own aspirations for this work: to enhance our understanding of the past through openness and collaboration, as can be seen in IIIF and Linked Art. As I bring this dissertation to a close, I am filled with a sense of accomplishment and a renewed commitment to promoting sound socio-technical practices. It is my hope that the insights and methodologies presented here will inspire others to engage in this ongoing dialogue, continually asking and answering the many questions that arise as we collectively explore our cultural heritage landscapes. Throughout this dissertation, British English spelling conventions are predominantly observed. However, there are instances of American English spelling where direct quotations from sources are used as well as referring to names of institutions, standards, or concepts. ↩︎ SNSF Data Portal - Grant number 193788: https://data.snf.ch/grants/grant/193788 ↩︎ Seminar für Kulturwissenschaft und Europäische Ethnologie: https://kulturwissenschaft.philhist.unibas.ch/ ↩︎ DHLab: https://dhlab.philhist.unibas.ch/ ↩︎ HKB: https://www.hkb.bfh.ch/ ↩︎ The considerable size of the ASV collection, which includes over 90,000 analogue objects, reflects not just the work of the main authors but also the contributions from numerous explorers and additional material beyond the maps and primary publications. ↩︎ Max Frischknecht’s PhD: https://phd.maxfrischknecht.ch/ ↩︎ PIA project website: https://about.participatory-archives.ch/ ↩︎ The vision of the PIA project was first written in German and then translated into English and French. ↩︎ In our joint paper, we wrote ‘man-made’, corrected here, which makes me think of the transition within the CIDOC-CRM for the Entity E22 Human-Made Object from version 6.2.7 onward. ↩︎ Knora Base Ontology: https://docs.dasch.swiss/2023.07.01/DSP-API/02-dsp-ontologies/knora-base/ ↩︎ SIPI documentation: https://sipi.io/ ↩︎ IIIF Working Groups Meeting, The Hague, 2016: https://iiif.io/event/2016/thehague/ ↩︎ Van Gogh, Vincent. (1889). Irises [Oil on canvas]. Getty Museum, Los Angeles, CA, USA. https://www.getty.edu/art/collection/object/103JNH ↩︎ Giacometti, Alberto. (1956). L’homme qui marche I [Sculpture]. Carnegie Museum of Art, Pittsburg, PA, USA. https://www.wikidata.org/entity/Q706964 ↩︎ UNESCO World Heritage List: https://whc.unesco.org/en/list/ ↩︎ Blue Shield International: https://theblueshield.org/ ↩︎ The ICBS was founded by the ICA, ICOM, ICOMOS, and IFLA. ↩︎ Guro. (1900-1950). Male Face Mask (Zamble) [Wood and pigment]. Art Institute of Chicago, Chicago, IL, USA. https://www.artic.edu/artworks/239464 ↩︎ I have opted for the term ‘affordance’ and not ‘representation’ as my intention is to maintain a comprehensive scope that encompasses various modalities such as modelling endeavours. ↩︎ To some degree, parallels can be drawn between the distinctions of cultural and digital heritage with those drawn between the humanities and DH. ↩︎ Inicio - Museos Comunitarios de América: https://www.museoscomunitarios.org/ ↩︎ The descriptions of each of these nine dimensions are selected excerpts from @star_ethnography_1999. ↩︎ A PID is a long-lasting reference to a digital resource. It usually has two components: a unique identifier and a service that locates the resource over time, even if its location changes. The first helps to ensure the provenance of a digital resource (that it is what it purports to be), whilst the second will ensure that the identifier resolves to the correct current location [@digital_preservation_coalition_persistent_2017] ↩︎ Rijksmuseum: https://www.rijksmuseum.nl/ ↩︎ In the original version, these instances contained typographical or factual errors. They have been struck through and corrected here. ↩︎ ↩︎ ↩︎ @zeng_metadata_2022 [p. 11] articulate that ‘as with “data”, metadata can be either singular or plural. It is used as singular in the sense of a kind of data; however, in plural form, the term refers to things one can count’. In the context of this thesis, I have chosen to favour the plural form of (meta)data. However, I acknowledge that I may occasionally use the singular form when referring to the overarching concepts or when quoting references verbatim. ↩︎ The snapshot of this bibliographic record was taken from https://swisscovery.slsp.ch/permalink/41SLSP_UBS/11jfr6m/alma991170746542405501. ↩︎ Seeing Standards: A Visualization of the Metadata Universe. 2009-2010. Jenn Riley. https://jennriley.com/metadatamap/seeingstandards.pdf ↩︎ A widespread example in the CH domain is the serialisation of metadata in XML, a W3C standard. ↩︎ It is noteworthy that the diversity of metadata standards in the heritage domain, characterised primarily by a common emphasis on descriptive attributes, is not counter-intuitive. This variation reflects the diverse nature of CH resources and the nuanced needs of GLAMs. ↩︎ MARC Standards: https://www.loc.gov/marc/ ↩︎ RDA: https://www.loc.gov/aba/rda/ ↩︎ If RDA was initially envisioned as the third edition of AACR, it faces the challenge of maintaining a delicate balance between preserving the AACR tradition while embracing the necessary shifts required for a successful and relevant future for library catalogues that can easily be interconnected with standards from archives, museums, and other communities [see @coyle_resource_2007]. ↩︎ MODS: https://www.loc.gov/standards/mods/ ↩︎ METS: https://www.loc.gov/standards/mets/ ↩︎ People might even argue that FRBR is only interesting as an ‘intellectual exercise’ [@zumer_functional_2007 p. 27]. ↩︎ LRMer: https://www.iflastandards.info/lrm/lrmer ↩︎ BibFrame: https://www.loc.gov/bibframe/ ↩︎ EAD: https://www.loc.gov/ead/ ↩︎ ISAD(G): General International Standard Archival Description - Second edition https://www.ica.org/en/isadg-general-international-standard-archival-description-second-edition ↩︎ PREMIS: https://www.loc.gov/standards/premis/ ↩︎ RiC Conceptual Model: https://www.ica.org/en/records-in-contexts-conceptual-model ↩︎ RiC-O: https://www.ica.org/standards/RiC/ontology ↩︎ CDWA: https://www.getty.edu/research/publications/electronic_publications/cdwa/ ↩︎ CCO: https://www.vraweb.org/cco ↩︎ VRA: https://www.vraweb.org/ ↩︎ VRA Core 4.0 and CCO have a symbiotic relationship, with CCO providing data content guidelines and incorporating the VRA Core 4.0 methodology. The latter also been leveraged in other contexts to form the basis for more granular Linked Data vocabularies [see @mixter_using_2014]. ↩︎ In French, the original language used for this acronym, CIDOC stands for Comité international pour la documentation du Conseil international des musées. ↩︎ LIDO: https://cidoc.mini.icom.museum/working-groups/lido/ ↩︎ CIDOC Working Groups: https://cidoc.mini.icom.museum/working-groups/ ↩︎ CIDOC-CRM: https://cidoc-crm.org/ ↩︎ CRM-SIG Meetings: https://www.cidoc-crm.org/meetings_all ↩︎ CIDOC-CRM V7.1.2: https://www.cidoc-crm.org/html/cidoc_crm_v7.1.2.html ↩︎ For a quick overview of the classes and properties of CIDOC-CRM, I recommend visiting the dynamic periodic table created by Remo Grillo (Digital Humanities Research Associate at I Tatti, Harvard University Center for Italian Renaissance Studies): https://remogrillo.github.io/cidoc-crm_periodic_table/ ↩︎ CIDOC-CRM compatible models and collaborations: https://www.cidoc-crm.org/collaborations ↩︎ At the time of writing none of these CIDOC-CRM extensions have been formally approved by CRM-SIG. It is also worth mentioning that other extensions based on CIDOC-CRM have been developed by the wider community, such as Bio CRM, a data model for representing biographical data for prosopographical research [see @tuominen_bio_2017] or ArchOnto, which is a model created for archives [see @hall_archonto_2020]. ↩︎ CRMact: https://www.cidoc-crm.org/crmact/ ↩︎ CRMarchaeo: https://cidoc-crm.org/crmarchaeo/ ↩︎ CRMba: https://www.cidoc-crm.org/crmba/ ↩︎ CRMdig: https://www.cidoc-crm.org/crmdig/ ↩︎ CRMgeo: https://www.cidoc-crm.org/crmgeo/ ↩︎ CRMinf: https://www.cidoc-crm.org/crminf/ ↩︎ CRMsci: https://www.cidoc-crm.org/crmsci/ ↩︎ CRMsoc: https://www.cidoc-crm.org/crmsoc/ ↩︎ CRMtex: https://www.cidoc-crm.org/crmtex/ ↩︎ FRBRoo: https://www.cidoc-crm.org/frbroo/ ↩︎ PRESSoo: https://www.cidoc-crm.org/pressoo/ ↩︎ Linked Art: https://linked.art ↩︎ DCMI Metadata Terms: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ ↩︎ Getty Vocabularies: https://www.getty.edu/research/tools/vocabularies/ ↩︎ Mastodon: https://joinmastodon.org/ ↩︎ Homosaurus: https://homosaurus.org/ ↩︎ DOLCE: www.loa.istc.cnr.it/dolce/overview.html ↩︎ It must be noted though that the use of DLs in KR predates the emergence of ontological modelling in the context of the Web, with its origins going back to the creation of the first DL modelling languages in the mid-1980s [@krotzsch_description_2013]. ↩︎ LinkedDataGPT: https://ld.gpt.liip.ch/ ↩︎ Neo4j: https://neo4j.com/ ↩︎ GB and PB are units of digital information storage capacity. 1 GB is equal to 1,000,000,000 ($10^{9}$) bytes, 1 TB is equal to 1,000,000,000,000 ($10^{12}$) bytes, and 1 PB, is equal to 1,000,000,000,000,000 ($10^{15}$) bytes. If a standard high-definition movie is around 4-5 GB, then 1 PB could store tens of thousands of movies. In 2011, @gomes_survey_2011 [p. 414] reported that the Internet Archive held 150,000 million contents of archived websites – crawled through the Wayback Machine – or approximately 5.5 PB. As of December 2021, it was about 57 PB of archived websites and a total used storage of 212 PB, see https://archive.org/web/petabox.php. ↩︎ In this context, UX is understood as an umbrella term encompassing both user and/or customer service, emphasising that the focus is on individuals who need or use a given service, regardless of their categorisation as users or customers. ↩︎ According to @nargesian_data_2019 [p. 1986], a data lake is a vast collection of datasets that has four characteristics. It can be stored in different storage systems, exhibit varying formats, may lack useful metadata or use differing metadata formats, and can change autonomously over time. ↩︎ An interesting initiative in this area is the use of RAIL, which empower developers to restrict the use of AI on the software they develop to prevent irresponsible and harmful applications: https://www.licenses.ai/ ↩︎ Common Objects in Context: https://cocodataset.org/ ↩︎ Viscounth – A Large Dataset for Visual Question Answering for Cultural Heritage: https://github.com/misaelmongiovi/IDEHAdataset ↩︎ Artificial Intelligence for Libraries, Archives & Museums: https://sites.google.com/view/ai4lam ↩︎ AEOLIAN Network: https://www.aeolian-network.net/ ↩︎ Newspaper Navigator: https://news-navigator.labs.loc.gov/ ↩︎ @perrigo_exclusive_2023 investigated that Kenyan workers made less than USD 2 an hour to identify and filter out harmful content for ChatGPT. ↩︎ FOSTER Plus (Fostering the practical implementation of Open Science in Horizon 2020 and beyond) was a 2-year EU-funded project initiated in 2017 with 11 partners across 6 countries. Its main goal was to promote a lasting shift in European researchers’ behaviour towards Open Science becoming the norm. ↩︎ According to the Open Knowledge Foundation, a non-profit network established in 2004 in the U.K., which aims to promote the idea of open knowledge, sets out some some principles around the concept of openness and defines it as follows: ‘Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)’. https://opendefinition.org/ ↩︎ Phrenosis phronesis[26:2] in philosophy is related to ‘practical understanding; wisdom, prudence; sound judgement’ [@oxford_english_dictionary_phronesis_2023] ↩︎ Zooniverse: https://www.zooniverse.org/ ↩︎ FromThePage: https://fromthepage.com/ ↩︎ FAIR Principles: https://www.go-fair.org/fair-principles/ ↩︎ FAIR Signposting: https://signposting.org/FAIR/. Signposting focuses on expressing the topology of digital objects on the web with a view to increasing the FAIRness of scholarly objects in a distributed manner [@van_de_sompel_fair_2023]. ↩︎ CARE Principles for Indigenous Data Governance: https://www.gida-global.org/care ↩︎ The Santa Barbara Statement on Collections as Data: https://collectionsasdata.github.io/statement/ ↩︎ British Library’s Research Repository: https://bl.iro.bl.uk/ ↩︎ Data Foundry – Data collections from the National Library of Scotland: https://data.nls.uk/ ↩︎ LoC Labs Data Sandbox: https://data.labs.loc.gov/ ↩︎ Royal Danish Library’s Mediestream: https://www2.statsbiblioteket.dk/mediestream/avis ↩︎ Meemoo’s Art in Flanders: https://artinflanders.be/ ↩︎ BVMC Labs: https://data.cervantesvirtual.com/ ↩︎ DATA-KBR-BE: https://www.kbr.be/en/projects/data-kbr-be/ ↩︎ A Checklist to Publish Collections as Data in GLAM Institutions: https://glamlabs.io/checklist/ ↩︎ The birth of the Web: https://home.cern/science/computing/birth-web ↩︎ All general-purpose servers must support the methods GET and HEAD. All other methods are optional. ↩︎ Schema.org: https://schema.org/ ↩︎ [@wood_linked_2014 p. 35] ↩︎ FOL, also known as first-order predicate logic or first-order predicate calculus, is a formal system of symbolic logic used in mathematics, philosophy, and computer science. It is a logical framework for expressing and reasoning about statements involving objects and their properties and relationships. In FOL, statements are represented using variables, constants, functions, and predicates. It allows for the quantification of variables and the formulation of statements such as ∀ (for all) and ∃ (there exists), which enable the expression of universal and existential quantification. As such, FOL can express facts concerning some or all of the objects in the universe. Its epistemological commitment, i.e. what an agent believes about facts, is concentrated of what is true, false, or unknown [see @russell_artificial_2010 pp. 285 ff.] ↩︎ It must be noted that DL, a subset of FOL – briefly introduced in 3.2.4.4, has a more restricted syntax and semantics tailored for ontology modelling. ↩︎ IRI is an extension of URI that allows for the use of international characters and symbols in web addresses. ↩︎ JSON-LD will be discussed through examples in . ↩︎ Author’s translation: ‘We need to give up on the idea of syntactic or structural interoperability through the use of a single model, whether for producing, storing or managing data within an information system’. ↩︎ CHAOSS: https://chaoss.community/ ↩︎ IIIF Annual Conference and Showcase - Los Angeles, CA, USA - June 4-7, 2024: https://iiif.io/event/2024/los-angeles/ ↩︎ Micrio: https://micr.io/ ↩︎ Message written on the IIIF Slack Workspace on 28 October 2022. ↩︎ mirador-image-tools: https://github.com/ProjectMirador/mirador-image-tools ↩︎ For instance, this user interface view of Claude Monet (1840-1926): https://lux.collections.yale.edu/view/person/642a0152-1567-4fbe-93f3-66f11c5cab9a and its Linked Art counterpart: https://lux.collections.yale.edu/data/person/642a0152-1567-4fbe-93f3-66f11c5cab9a ↩︎ QLever: https://github.com/ad-freiburg/qlever ↩︎ The title of the section is an homage to Bruno Latour and a passage found in his book ‘We have never been modern’. ↩︎",
+ "content_html": "Linked Open Usable Data for Cultural Heritage: Perspectives on Community Practices and Semantic Interoperability PhD Thesis in Digital Humanities, completed as part of the Graduate School of Social Sciences’ (G3S) doctoral programme. It was successfully defended on 18 November 2024 (slides). This page will host a lightweight HTML version of my thesis, optimised for easy access and readability. The PDF version (e-dissertation) is available on the University of Basel’s repository: https://doi.org/10.5451/unibas-ep96807. Page in construction (please be patient ⌛) Author Dr. Julien A. Raemy (University of Basel) https://orcid.org/0000-0002-4711-5759 Supervisors Prof. Dr. Peter Fornaro (University of Basel) https://orcid.org/0000-0003-1485-4923 Prof. Dr. Walter Leimgruber (University of Basel) Dr. Robert Sanderson (Yale University) https://orcid.org/0000-0003-4441-6852 Abstract Digital technologies have fundamentally transformed how Cultural Heritage (CH) collections are accessed and engaged with. Linked Open Usable Data (LOUD) specifications, including the International Image Interoperability Framework (IIIF) Presentation API 3.0, Linked Art, and the W3C Web Annotation Data Model, have emerged as web standards to facilitate the description and dissemination of these valuable resources. Despite the widespread adoption of IIIF, implementing LOUD specifications, particularly in combination, remains challenging. This is especially evident in the development and assessment of infrastructures, or sites of assemblage, that support these standards. This research is guided by two perspectives: community practices and semantic interoperability. The first perspective assesses how organizations, individuals, and apparatuses engage with and contribute to the consensus-making processes surrounding LOUD. By examining these practices, the social fabrics of the LOUD ecosystem can be better understood. The second perspective focuses on making data meaningful to machines in a standardized, interoperable manner that promotes the exchange of well-formed information. This research is grounded in the SNSF-funded project, Participatory Knowledge Practices in Analogue and Digital Image Archives (PIA) (2021–2025), which aims to develop a citizen science platform for three photographic collections from the Cultural Anthropology Switzerland (CAS) archives. Actor-Network Theory (ANT) forms the theoretical foundation, aiming to describe the collaborative structures of the LOUD ecosystem and emphasize the role of non-human actors. Beyond its implementation within the PIA project, this research includes an analysis of the social dynamics within the IIIF and Linked Art communities and an investigation of Yale’s Collections Discovery platform, LUX. The research identifies socio-technical requirements for developing specifications aligned with LOUD principles. It also examines how the implementation of LOUD standards in PIA highlights their potential benefits and limitations in facilitating data reuse and broader participation. Additionally, it explores Yale University’s large-scale deployment of LOUD standards, emphasizing the importance of ensuring consistency between Linked Art and IIIF resources within the LUX platform for the CH domain. The core methodology of this thesis is an actor- and practice-centered inquiry, focusing on a detailed examination of specific cosmologies within LOUD-driven communities, PIA, and LUX. This micro-perspective approach provides rich empirical evidence to unravel the intricate web of cultural processes and constellations in these contexts. Key empirical findings indicate that LOUD enhances the discoverability and integration of data in CH, requiring community-driven consensus on model interoperability. However, significant challenges include engaging marginalized groups, sustaining long-term participation, and balancing technological and social factors. Strategic use of technology and the capture of digital materiality are critical, but LOUD also poses challenges related to resource investment, data consistency, and the broader implementation of complex patterns. LOUD should lead efforts to improve the accessibility and usability of CH data. The community-driven methodologies of IIIF and Linked Art inherently foster collaboration and transparency, making these standards essential tools in evolving data management practices. Even for institutions and projects that do not adopt these specifications, the socio-technical practices of LOUD offer vital insights into effective digital stewardship and strategies for community engagement. Keywords: Actor-Network Theory; Community of Practice; Cultural Anthropology Switzerland; Cultural Heritage; Digital Infrastructure; International Image Interoperability Framework; Knowledge Practices; Linked Art; Linked Data; LUX; Participatory Archives; Photographic Archives; Semantic Interoperability; Web Annotation Data Model Table of Contents Introduction Context Interlinking Cultural Heritage Data Exploring Relationships through an Actor-Network Theory Lens Research Scope and Methodology The Social Fabrics of IIIF and Linked Art PIA as a Laboratory Yale’s LUX and LOUD Consistency Discussion Conclusion 1. Introduction Since its inception in 2011, the IIIF has revolutionised[1] the accessibility of image-based resources. Initially driven by the needs of manuscript scholars, IIIF focused on two-dimensional images, but has since expanded to encompass a wide range of image-based resources, including audiovisual materials and, in the near future, 3D images. Similarly, Linked Art, formally established in 2017, initially concentrated on art museum objects but has since broadened its scope to model a variety of CH entities, leveraging CIDOC-CRM, a renowned ontology in the museum and DH space. Both initiatives aim to break down silos: IIIF focuses on improving the presentation of digital objects, while both initiatives enhance their dissemination. Together, they make CH data more accessible through IIIF and more meaningful through Linked Art for machines. These efforts have primarily benefited the CH domain. A key commonality is that the main APIs these communities create align with the LOUD design principles, either intentionally or empirically demonstrated through use cases. These principles enable software developers to develop compliant tools and services without needing to fully understand RDF, a syntax for representing information on the web. Additionally, they may not need to grasp all LOD principles, which promote the interlinking of data from diverse datasets using tools like KOS such as thesauri. WADM, a W3C standard, is also recognised as a LOUD specification. It provides a framework for creating interoperable annotations on web resources, facilitating the linking and sharing of data across different platforms and applications. These LOUD design principles include the right abstraction for the audience, few barriers to entry, comprehensibility by introspection, documentation with working examples, and the use of many consistent patterns rather than few exceptions. Additionally, both IIIF and Linked Art are driven by vibrant communities, mainly comprising GLAM and higher education institutions. While the standards and principles discussed have broad applications, it is important to clarify the scope of this dissertation. This work does not focus on KGs by assessing triplestores – databases specifically designed to store and retrieve triples, which are the fundamental data structures in RDF. Similarly, it does not deal with evaluating SPARQL engines, which are specifically designed to query KGs. Additionally, this dissertation does not address the intersection of ML and IIIF, or the ontological reasoning of Linked Art. Instead, this dissertation concentrates on LOUD, the consistency of its standards, design principles and the vibrant communities behind it. It examines JSON-LD serialisation efforts and the crucial intersection required to establish robust semantic interoperability baselines between presentation and semantic layers. It also presents real-world use case implementations, both on a small scale in a laboratory and flexible space within the PIA research project, and on a large scale at Yale, exemplified by the LUX platform that provides access to (meta)data from YUL, YCBA, YUAG, and YPM. The focus is therefore on digital infrastructures capable of delivering JSON-LD files from the above specifications, which are primarily, though not exclusively, CH resources. It is more about the different actors – both human and non-human – that create and maintain these interconnected systems and the dynamic interactions that sustain them. The deployment of various LOUD specifications addresses the need for semantic interoperability between CH resources and disparate datasets by establishing a standardised approach to representing and linking data, ensuring that information can be seamlessly shared and understood across different platforms and contexts. This dissertation seeks to carve out a distinct niche by addressing an often-overlooked aspect of IIIF and Linked Art. IIIF is sometimes perceived and studied merely as a service or an appendix, with the content it delivers taking precedence. However, this PhD thesis positions IIIF as a first-class citizen worthy of in-depth study. Similarly, Linked Art, despite its potential and its relatively recent establishment, has been the subject of very few scholarly papers. This gap underscores the significance of LOUD in this context. Furthermore, this thesis elevates Linked Art to a position of primary importance, recognising its significance and advocating for its thorough examination. To thoroughly study LOUD and its adherence to design principles, it is essential to immerse ourselves actively in both communities – an approach I have embraced for years. The thesis also emphasises the importance of participatory efforts and collaboration between research projects, which typically have shorter lifespans, and memory institutions, which need to implement technical standards as a lingua franca. In doing so, it reveals the mediating role of LOUD in advancing the heritage sphere. To truly understand IIIF, Linked Art, and to a lesser extent WADM, it is crucial to examine the social fabrics and consensus decision-making of each community. Among these considerations are how the specifications can be implemented pragmatically, and how the standards can support the implementation and maintenance of more extensive semantic interoperability efforts. The significance of this research lies in highlighting the commitment and diligence of the individuals and organisations that make up both the IIIF and Linked Art communities. It aims to demonstrate that community-driven practices, such as those exemplified by IIIF and Linked Art, have a potential that goes beyond the mere sharing of digital objects and their associated metadata. The more people who embrace these approaches and implement the associated specifications, the more society as a whole will benefit. Furthermore, this research illustrates that IIIF is no longer limited to two-dimensional images, that Linked Art is not restricted to artworks, and that WADM is a simple, content-agnostic standard that can be easily integrated into a range of systems. This adaptability is a strength of LOUD standards, which are designed to be simple yet effective. LOUD can serve a variety of purposes, primarily rooted in CH, but with the potential to extend its benefits to other sectors. The true beauty of LOUD lies in its ability to foster networking opportunities and transparent socio-technical practices, demonstrating its value beyond mere technical implementation. By emphasising these aspects, this dissertation highlights the wider impact of LOUD in promoting semantic interoperability and enhancing collaborative efforts within the heritage field and beyond. In addition, the implementation of standards through PIA underlines the potential for similar participatory or citizen science projects, while the LUX initiative serves as an illustrative example of robust infrastructure and cross-unit engagement. These examples demonstrate the practical applications and far-reaching implications of adopting LOUD standards in different contexts. This dissertation is structured across ten chapters, each building upon the previous ones up to Chapter 5 to provide a comprehensive understanding of the research. These initial chapters lay the foundation of the study, establishing the context, theoretical framework, and methodological approaches. After this foundational section, Chapters 6, 7, and 8 present empirical studies that, while interconnected, can be read independently if desired. These chapters offer detailed insights into specific aspects of the research and can be appreciated on their own or as part of the broader narrative. The thesis continues with Chapter 2, which extends this introduction by providing more information about the research setting, specifically PIA. Chapter 3 follows with an extensive literature review, offering a comprehensive overview of methods to interlink CH data. Next, Chapter 4 presents the theoretical framework, conceptualised as a toolbox and firmly rooted in ANT, guiding the analysis and discussion throughout the dissertation. Following this, Chapter 5 details the research scope and methodology, explaining the approaches and methods employed in the study. Moving on to the empirical work, Chapter 6 sheds light on the social fabrics of IIIF and Linked Art, exploring the communities and practices that underpin these initiatives.Chapter 7 then examines the implementation of LOUD standards within PIA, highlighting the practical aspects and challenges encountered. This is followed by Chapter 8, which focuses on the LUX initiative at Yale, examining the underlying governance and interdepartmental ownership of the Yale Collections Discovery platform. The discussion of findings is presented in Chapter 9, where the results from the empirical chapters are synthesised and analysed in relation to the theoretical framework. Finally, Chapter 10 concludes the thesis, summarising the key insights and contributions of the research while outlining potential directions for future study. 2. Context In this chapter, I will set the stage for my PhD thesis by providing important background information. First, in Section 2.1, I will explain why I chose the title for my thesis. This will give you an understanding of the main focus and the direction of my research. Next, in , I will describe the PIA research project, which is central to my work. This section will cover the project’s goals, significance, and overall framework. In , I will detail my specific contributions to the PIA project. I will emphasise how my work fits into the larger project and its importance to my thesis. Finally, in , I will talk about my active participation in the IIIF and Linked Art communities. This section will highlight how my involvement in these communities has influenced my research and its broader implications. 2.1 PhD Title I chose the title ‘Linked Open Usable Data for Cultural Heritage: Perspectives on Community Practices and Semantic Interoperability’ as it encapsulates the essence of my research focus but I could have indeed chosen other ones. During the initial stages of my research, multiple working titles were explored to capture the diverse facets of my interests and objectives. If I was quite sure about having in the title after the third iteration, I was quite unsure of what should follow and if a subtitle was actually needed at all. Amidst this dynamic progression, the underlying theme of my research remained steadfast – to delve into the transformative potential of LOUD for CH. I also opted to maintain in the title of my thesis subsection. While holds its appeal, my choice reflects a broader narrative that acknowledges the crucial role of CHIs and spotlighting the multifaceted nature of heritage preservation, encapsulating both its digital facets and the essential contribution of individuals and institutions in curating, interpreting, and making heritage accessible. As for the subtitle, while I do explore CoP as defined by @lave_situated_1991 and @wenger_communities_2011 through investigating the social fabrics of the IIIF and Linked Art communities, my main interest lies in the broader application of LOUD for describing and interlinking CH resources. Thus, I decided to opt for the more generic as the first axis or perspective. For the second perspective, I wanted to see how semantic interoperability can be achieved through standards adhering to the LOUD design principles, as they seem to be key enablers for seamless collaboration and knowledge exchange among practitioners. There was a time in my research when I envisaged decoupling and , perceiving them as two distinct dimensions. However, what really captivates me is the unification of these factors to facilitate collective reasoning for both humans and machines. In summary, this title reflects my enthusiasm for using web-based and community-driven technologies to transform the way we understand, share and value CH. 2.2 The PIA Research Project I undertook my doctoral studies within the scope of the PIA research project financed by the SNSF under their Sinergia funding scheme from February 2021 to January 2025[2]. The project aimed to analyse the interplay of participants, epistemological orders and the graphical representation of information and knowledge in relation to three photographic collections from CAS. It sought to bring together the world of data and things in an interdisciplinary manner, exploring the phases of the analogue and digital archive from a cultural anthropological, technical and design research perspective [@felsing_community_2023 p. 42]. As part of this endeavour, interfaces were developed to enable the collaborative indexing and use of photographic archival records [@chiquet_participatory_2023 p. 110]. I discuss in more detail the interdisciplinary components and briefly introduce the people involved in the project in Subsection 2.2.1, then talk about the photographic collections that were the overarching narrative of the research in Subsection 2.2.2, and lastly in Subsection 2.2.3, the vision that we had put together. The project, divided in three interdisciplinary teams, was led by the University of Basel through the Institute for Cultural Anthropology and European Ethnology[3] (Team A) and the DHLab[4] in collaboration with the DBIS group (Team B) as well as by the HKB[5], an art school and department of the Bern University of Applied Sciences (Team C) [@felsing_community_2023 p. 43]. Table 2.1 lists the people who contributed to the project, broken down by the three teams and their particular perspectives. Table 2.1: PIA Team Core Members Perspective People A) Anthropological Prof. Dr. Walter Leimgruber, Team Leader and Dissertation Supervisor Dr. Nicole Peduzzi, Photographic Restoration and Digitisation Supervisor Regula Anklin, Conservation and Restoration Specialist (project partner at Anklin & Assen) Murielle Cornut, PhD Candidate in Cultural Anthropology Birgit Huber, PhD Candidate in Cultural Anthropology Fabienne Lüthi, PhD Candidate in Cultural Anthropology B) Technical Prof. Dr. Peter Fornaro, Team Leader and Dissertation Supervisor Prof. Dr. Heiko Schuldt, Dissertation Supervisor (project partner at the University of Basel) Dr. Vera Chiquet, Postdoctoral Researcher Adrian Demleitner, Software Developer (2021-2023) Fabian Frei, Software Developer (2023-2025) Christoph Rohrer, Software Developer (2023-2025) Julien A. Raemy, PhD Candidate in Digital Humanities Florian Spiess, PhD Candidate in Computer Science C) Communicative Dr. Ulrike Felsing, Team Leader and Dissertation Supervisor Prof. Dr. Tobias Hodel, Dissertation Supervisor (project partner at the University of Bern) Daniel Schoeneck, Research Fellow Lukas Zimmer, Designer (project partner at A/Z&T) Max Frischknecht, PhD Candidate in Digital Humanities 2.2.2 Photographic Collections/Archives as Anchors CAS has historically been engaged in active collaborations that bridge the academic research and the public sphere, primarily through traditional analogue methods. The PIA project was created with the intention of exploring the complexities inherent in both analogue and digital approaches, and to encourage and investigate these collaborative endeavours between academia and the wider public. As such, PIA represents a paradigm shift within the scope of projects associated with or supported by CAS, facilitating the seamless integration of digital tools to explore multiple facets of participation and engagement. This transformative endeavour embodies a profound exploration of new intersections where scholarly endeavours intertwine with the active involvement of citizens. PIA drew on three collections: one focusing on scientific cartography and titled (Atlas der Schweizerischen Volkskunde), a second from the estate of the photojournalist Ernst Brunner (1901–1979), and a third collection consisting of vernacular photography which was owned by the Kreis Family (1860–1970). SGV_05 ASV consists of 292 maps and 1000 pages of commentary published from 1950 to 1995 — an example of such a map is shown in Figure 2.1. This collection was commissioned by the CAS to do an extensive survey of the Swiss population in the 1930s and 1940s on many issues pertaining, for instance, to everyday life, local laws, superstitions, celebrations or labour [@weiss_atlas_1940]. The contents were compiled by researchers and by people who were described as [6]. Questions were asked about everyday habits, community rights, work, trade, superstitions, and many other topics [@schmoll_richard_2009; @schmoll_vermessung_2009]. This collection offers a snapshot of everyday life in Switzerland right before the beginning of a modernisation process that fundamentally changed lifestyles in all areas during the postwar period. A digitised version of the ASV would not only allow the results of that time to be enriched with further findings [@schranz_critical_2021], but would also make transparent how knowledge was generated in cartographic form through a complex process along different types of media and actors. The restoration, digitisation, cataloguing and indexing efforts took all part throughout PIA under the supervision of Birgit Huber, who extensively based her doctoral research on this particular collection [see @huber_entdeckung_2023]. Figure 2.1: Map from the SGV_05 Collection Relating to Question 93 Showing Walks and Excursions at Pentecost. ASV. CAS. CC BY-NC 4.0 SGV_10 Kreis Family comprises approximately 20,000 loose photographic objects, where a quarter of them are organised and kept in 93 photo albums — as illustrated by Figure 2.2, from a wealthy Basel-based family and spanning from the 1850s to the 1980s. This private collection was acquired by CAS in 1991. The collection, which originally arrived in banana cases and was enigmatic due to the lack of clear organisation or accompanying information from the family, posed significant challenges. Despite these initial hurdles, CAS undertook meticulous efforts to catalogue and preserve its contents [@felsing_re-imagining_2024 p. 42]. The pictures were taken by studio photographers as well as by family members themselves. The Kreis Family collection represents a typical example of urban bourgeois culture and gives a comprehensive insight into the development of private photography over the course of a century [@pagenstecher_private_2009]. The photographic materials and formats are very diverse, ranging from prints to negatives, small, medium or large format photographs, black and white or colour. The collection also encompasses many photographic techniques, from the one-off daguerreotypes and ferrotypes, to the glass-based negatives that could be reproduced en masse, to the modern paper prints. While some of the albums and loose images were restored and digitised during the 2014 project, much of this work was completed during PIA and overseen by Murielle Cornut, whose doctoral investigation was centred on the study of photo albums [see @cornut_open_2023]. Figure 2.2: A photo Album Page from the SGV_10 Collection, Bearing the Following Inscription: Botanische Excursion ins Wallis, Pfingster 1928. SGV_10A_00031_015. Kreis Family. CAS. CC BY-NC 4.0 SGV_12 Ernst Brunner is a donation of about 48,000 negatives and 20,000 prints to the CAS archives from Ernst Brunner, a self-taught photojournalist, who lived from 1901 to 1979 and who documented mainly in the 1930s and 1940s a wide range of folkloristic themes — as shown by Figure 2.3. He is one of the most important photographers of the era and one of the most outstanding visual chroniclers of Swiss society [@pfrunder_ernst_1995]. His photographs show rural lifestyles, but also urban motifs. In his late work, he led the documentation and research on farmhouses in a specific Swiss district, a project initiated by CAS. Before Ernst Brunner became an independent photojournalist in the mid-1930s, he worked as a carpenter, influenced by the ideas of the Bauhaus and Neues Bauen movements. This can also be seen in the aesthetics and formal language of his photography. If all the black and white negatives were digitised and recorded between 2014 and 2018, the digitisation of prints, which is a selection done by Ernst Brunner, was conducted at the end of the PIA research project. The latter was supervised by Fabienne Lüthi, whose PhD was about organisational systems and knowledge practices in the Ernst Brunner Collection. Figure 2.3: Picture from the SGV_12 Collection Showing Walkers Looking at the Timetable Train. [Wanderer studieren den Fahrplan in der Bahnhofhalle]. Lucerne, 1938. Ernst Brunner. SGV_12N_00716. CAS. CC BY-NC 4.0 Whereas for each of the PhD Candidates in Cultural Anthropology, a particular collection was assigned to them and its content was to varying degrees part of their subject of study, this was not exactly the same for the PhD Candidates in DH, including myself, and in Computer Science. Put differently, we had relative leeway in terms of what interested us in each or all of these three photographic collections. In my case, I briefly explain my contribution to the project more in and then in as part of the empirical portion of my thesis focusing on the deployment of LOUD specifications using the three CAS photographic collections. Florian Spiess focused on the use of VR through vitrivr, a multimedia retrieval system developed by the DBIS research group at the Department of Mathematics and Computer Science [@spiess_multimodal_2022; @spiess_forschung_2023; @spiess_exploring_2024]. His work included experiments with PIA-related collections, such as the creation of virtual galleries clustered according to content-based similarity [see @peterhans_automatic_2022]. In the case of Max Frischknecht, his doctoral research centred on generative design[7], a methodology to visualise dynamic cultural archives. He mostly worked on the ASV collection and on a mapping tool which is a cartographic visualisation designed to explore the CAS photographic archives [see @frischknecht_generating_2022; @eggmann_digitalisierung_2024]. It should also be mentioned that not only did we use the three collections of the CAS photographic archives within the project, but that both formal and informal meetings took place most commonly within the photographic archives at the Spalenvorstadt premises in the old Gewerbemuseum and later either at the on Allschwilerstrasse, though less frequently, or at Rheinsprung where the Institute for Cultural Anthropology and European Ethnology is located. This meant that there was a strong and sometimes blurred entanglement between those involved in the archives and the PIA core team members. 2.2.3 Project Vision Between December 2021 and March 2022, we worked together to develop and finalise a vision for the project[8]. It includes seven key priorities, or pillars, which were meant to strengthen the interdisciplinary perspectives of PIA. Although ambitious, these elements were of paramount importance to us and served as a guiding blueprint for all PIA activities. Hereafter is a modified version of the vision[9] taken from @cornut_annotations_2023 [p. 4]. Accessibility by developing open interfaces and offering the possibility of expanding the archive and turning it into an instrument of current research that collects and evaluates knowledge with the participation of other users (Citizen Science). Heterogeneity by making visible where, why and under what circumstances the objects were created, how they were handled and what path they have taken to get to and in the archive. We work on visualisations that take into account the heterogeneous character of archival materials and make their respective biographies visible. Materiality by conveying the material properties of the objects: they have front and back sides, inscriptions, traces, development errors, they are transparent, multi-layered or fabric-covered. They tell of their origin, use, and peculiarities. We want to make this knowledge accessible and understandable in digital form. To this end, we also consider the necessary infrastructure involved in the creation as part of their narrative: the restoration, the relocation, the indexing, the storage devices, the research tools, the display medium, as well as the process of repro-photography. Interoperability as a crucial component and which will be done by supporting digital means that allow different stakeholders to freely access and interact with the project’s data. Both humans and machines can use, contribute to, correct and annotate the existing data in an open and interoperable manner, thus encouraging exchange and the creation of new knowledge. To do this, we use web-based standards that are widely adopted in the cultural heritage field. Affinities by leveraging data models and pattern recognition which can uncover semantic relationships between entities that were previously incomplete or difficult for users to access. Using specific interfaces and visualisations, we make it possible to explore digital assets and discover forms of relationships and similarities between images. AI that facilitates automated searches for simple image attributes such as colour, shapes, and localisation of image components. It should also become possible to recognise texts and object types for extracting metadata. Bias Management by taking into account that associated metadata was human-made[10] and thus is never objective. Collections and their metadata reflect biases or focus narrowly on selected areas and perceptions. Machines working on the basis of such data automatically reproduce the implicit biases in decision-making due to so-called biased algorithms. Therefore, understanding the data used for training and the algorithms applied for decision making is crucial to ensure the integrity of the application of these technologies in archives. We take ethical issues into account when using AI and visualisations, because the higher the awareness of a possible bias, the faster it can be detected or brought up for consideration with users. As my thesis is notably concerned with semantic interoperability, Interoperability and Affinities are of particular importance to my PhD thesis, although I recognise the importance of all pillars. Each of these resonated with me and my fellow PhD Candidates. As we immersed ourselves in the vision of the PIA research project, it became a unifying thread that brought us together in our research ambitions. We found that all these priorities within the project spoke to us at different points and provided a strong point of communication and practice in the development of processes, prototypes or interfaces. 2.3 Contribution to PIA and its Relevance to the Thesis To develop a participatory platform, an open and sustainable technological foundation for facilitating the reuse of CH resources was needed [@raemy_applying_2021]. Throughout the PIA project, I was mainly involved in the extension of the data infrastructure, the uptake of IIIF as well as designing the data model, leveraging Linked Art and WADM [@raemy_interlinking_2024]. As a member of Team B, I undertook this PhD as a bridge between the different teams, mostly participating in discussions with the three doctoral candidates from Team A to further develop and agree on the CAS data model and with the software developers from my team to discuss the impact of the data model on our evolving — yet transitory — infrastructure as well as helping in implementing the APIs adhering to the LOUD design principles. It was necessary to redesign the data model within the context of a database migration, from Salsah to the DSP, that happened between November 2021 and March 2024. This updated version, based on the Knora Base Ontology[11], corresponded to the needs of the CAS archives and to some extent to those of PIA, in particular to enable the PhD Candidates in Cultural Anthropology to make more precise assertions, whether in terms of descriptive metadata, or in the ability to link one object to another or to provide comments on these objects in several narrative forms. Moreover, an assessment of the appropriate technical standards for improved usability of the objects by both humans and machines was carried out, as a basis for extending the capabilities provided by DaSCH, such as helping the software developers to implement SIPI[12], a C++ image server compatible with the IIIF Image API and build services that create IIIF Presentation API 3.0 resources. While the theoretical framework of the thesis extends across the scope of PIA, the empirical part focuses on a specific set of findings derived from the research project outlined in , under the title . In this chapter, I discuss the data model and its refinement as well as the generation of custom IIIF Manifests during the specific digitisation, cataloguing and indexing efforts that took place throughout the project for the three CAS collections (SGV_05, SGV_10 and SGV_12) under investigation, the implementation of LOUD standards, and the overall design of the technological underpinnings. 2.4 Involvement within the IIIF and Linked Art communities I must acknowledge the invaluable role that my involvement within the IIIF and Linked Art communities has played in shaping my journey as a trained information specialist and an aspiring DH practitioner. Being an active participant in both communities has not only broadened my understanding of the latest developments in the field but has also profoundly influenced the trajectory of this dissertation. I have been involved within the IIIF community since October 2016 and the Working Groups Meeting that happened in The Hague[13]. This significant journey was, in fact, initiated by a recommendation from my first supervisor, Peter Fornaro, during my time as an undergraduate doing an internship at the DHLab. Little did I know that this recommendation would lead me to be carrying out a PhD and looking at IIIF not only as community-driven standards but as an object of study. Engaging with the IIIF community exposed me to cutting-edge advances in image interoperability and standards, and fostered a deeper appreciation for the importance of digital representations of cultural heritage. Through collaborative discussions with experts from diverse backgrounds, I gained new perspectives on the potential of technology to advance humanities research and preserve our collective cultural memory. Similarly, my involvement in the Linked Art community introduced me to the opportunities offered by LOUD and its transformative impact on research discourse. Exposure to Linked Data methodologies and the CIDOC-CRM has significantly influenced the way I have structured and interpreted the data in this dissertation, thereby enriching its scholarly breadth and rigour. I started to be actively involved in Linked Art at the beginning of my PhD in 2021, but I was already a by 2020, driven by the efforts of Rob Sanderson, my third supervisor. By mid-2023, I had become a member of the Editorial Board. The individuals I have met and the knowledge shared in these vibrant communities have deeply informed my approach as a scholar. The invaluable connections and collaborations I have made have expanded my network of fellow researchers, educators, and experts, leading to fruitful discussions that have significantly shaped the research questions addressed in this thesis. The events and workshops organised by these communities have also provided immersive learning experiences, giving me first-hand insights into the tools, technologies and methodologies used in the context of describing and disseminating CH data. The dynamic ecosystem of these communities has served as an inspiring backdrop, fostering innovative thinking and encouraging a more holistic approach to my research. 3. Interlinking Cultural Heritage Data Interlinking CH data is an important aspect of publishing heritage collections over the web, in particular by using LOD technologies to make assertions more easily readable and meaningful to machines [@marcondes_integrated_2021]. Due to the complexity of CH data and their intrinsic inter-relationships, it is necessary to define its nature and introduce controlled vocabularies and ontologies that can be integrated with existing web standards and interoperable with relevant platforms [@bruseker_cultural_2017; @hyvonen_using_2020]. Efforts to interlink CH data have brought about significant advancements, but challenges remain. One such challenge is finding a balance between completeness and precision of expression to ensure that the that CH data remain accessible and usable to a wider audience. Addressing this challenge, the Linked Open Usable Data (LOUD) design principles and the specifications that adhere to those, such as the IIIF Presentation API 3.0 and Linked Art, offer a promising approach [@raemy_enabling_2023]. By focusing on usability aspects from the perspective of software developers and data scientists involved in designing visualisation tools and data aggregation approaches, LOUD strives to enhance the overall user experience [@sanderson_keynote_2019]. Finding this equilibrium becomes crucial as CH data continues to grow in complexity and size, necessitating the seamless integration of native web technologies. The LOUD concept cultivates an environment that encourages the formation of vibrant CoP and the seamless integration of native web technologies, wherein an essential principle is the availability of comprehensive documentation supplemented with practical examples [@raemy_ameliorer_2022]. Moreover, the emphasis on leveraging widely adopted technologies enhances the interoperability of data and promotes its wider dissemination. With LOUD principles guiding the linking of CH data, the resulting web of knowledge becomes more than just a machine-readable resource; it transforms into a user-centric ecosystem where both accessibility of Linked Data and usability intersect to enable scholars and a wider audience to engage in the exploration and appreciation of CH [@newbury_loud_2018]. Finally, by fostering a collaborative, knowledge-sharing mindset, LOUD empowers software developers to implement data in a robust way, drawing insights from shared experiences [see @page_linked_2020]. In this chapter, which serves as the literature review of the PhD thesis, I attempt to draw on this brief introduction by dividing the insights into seven sections in order to provide an overview of the key concepts related to interlinking data in the CH domain. The literature review primarily encompasses works published up until December 2023, providing a comprehensive snapshot of the field’s current state and its evolution. Section 3.1 discusses what makes CH data stand out and Section 3.2 is about CH metadata standards, while Section 3.3 explores the technological trends, scientific movements and guiding principles that have shaped the field. Section 3.4 provides an overview of the web as an open platform, which are essential to understanding the current landscape of interlinking CH data. Section 3.5 focuses on LOUD, while Section 3.6 looks at characterising the community practices and semantic interoperability dimensions for CH. Finally, in Section 3.7, I summarise key elements from each section and within each of these I give some initial thoughts with respect to LOUD, and then conclude the chapter with some considerations on why we as a society need to care about CH data. 3.1 What Makes Cultural Heritage Data Stand Out? Here, I aim to establish the indirect territory of my study, as I am situated on a distinct plane that focuses on web technologies and standards — as well as software and services that enable them — as the subjects of investigation. However, it is crucial to acknowledge that LOUD specifications owe their existence to the available data that have served as case studies. Thus, their significance can be best understood through the lens of data and I recognise here the pivotal role played by CH practitioners — encompassing individuals from research and memory institutions — who have had a significant impact on specifying a series of web-based standards and who have helped to move forward the discovery of CH data and beyond, in particular those belonging to the public domain, in an open manner. In Subsection 3.1.1, I provide an introduction to CH as recognised by the UNESCO. I explore the tangible, intangible, and natural dimensions of CH, laying the foundation for further discussions on its representation and preservation, notably by giving a first definition of CH data. Next in 3.1.2, I look at the challenges of representation and embodiment of CH data. This subsection examines the challenges in describing and preserving its materiality or embodied aspects. Understanding the significance of collective efforts, communities, and the interplay of technologies. Thirdly, I discuss what I called ‘Collectives and Apparatuses’ in 3.1.3 where I highlight how actors in terms of collaborative actions and apparatuses play a pivotal role in CH. 3.1.1 Cultural Heritage The legacy of CH encompasses physical artefacts and intangible aspects inherited from past generations, reflecting the history and traditions of societies. Meanwhile, CH constantly evolves due to complex historical processes, necessitating preservation and protection efforts to prevent its loss over time [@loulanski_revising_2006]. The dynamic nature of CH demands collaborative actions, including documentation and the use of a range of technologies. The concept of CH is also characterised by perpetual evolution, mirroring the historical processes that shape societies over time. Social, political, economic, and technological shifts invariably influence the definition and perception of CH, prompting continuous reinterpretations and reevaluations of its significance. Over the years, the enthusiasm for the protection of cultural property has enriched the term with new shades of meaning. As societies undergo transformations, new layers of meaning and relevance are superimposed on existing CH, perpetually enriching its essence. As articulated by [@ferrazzi_notion_2021 p. 765]: ‘Cultural heritage’, as an abstract legacy or as a merge of tangible and intangible values, is able to encompass the totality of culture(s); in so, assuming a symbolic value that brings a clear break with all other terminologies. In conclusion, ‘cultural heritage’ as a legal term has demonstrated more than any others to be a real ensemble of historical stratification and cultural diversity. The advent of globalisation and rapid advancements in technology have further accelerated the evolution of CH. Increased interconnectedness and cross-cultural interactions have led to the fusion of traditions and the emergence of novel cultural expressions. Moreover, the digital era has facilitated the dissemination of CH resources on a global scale, transcending geographical barriers and preserving cultural knowledge for future generations as [@portales_digital_2018]. Thus, the intriguing nature of CH resources can be attributed to their multifaceted and diverse characteristics. The conservation and promotion of these resources demand a nuanced comprehension of the various types of heritage resources, culminating in effective preservation and promotion strategies that can account for their heterogeneity [@windhager_visualization_2019]. According to [@unesco_institute_for_statistics_unesco_2009], CH includes tangible and intangible heritage. Tangible CH refers to physical objects such as artworks, artefacts, monuments, and buildings, while intangible CH comprises practices, knowledge, folklore and traditions that hold cultural significance [@munjeri_tangible_2004]. The concept of heritage has evolved through a process of extension to include objects that were not traditionally considered part of the heritage. The criteria for selecting heritage have also changed, taking into account cultural value, identity, and the ability of the object to evoke memory. This shift has led to the recognition and protection of intangible CH, challenging a Eurocentric perspective and embracing cultural diversity as a valuable asset for humanity [@vecco_definition_2010]. Conservation guidelines have broadened the concept of heritage to include not only individual buildings and sites but also groups of buildings, historical areas, towns, environments, social factors, and intangible heritage [@ahmad_scope_2006]. In 2019, another instance of UNESCO defines CH in an even more comprehensive manner, taking into account natural heritage: Cultural heritage is, in its broadest sense, both a product and a process, which provides societies with a wealth of resources that are inherited from the past, created in the present and bestowed for the benefit of future generations. Most importantly, it includes not only tangible, but also natural and intangible heritage. [@unesco_culture_for_development_indicators_methodology_2014 p. 130] In thinking about the concept of CH, I find this last definition particularly resonant. This broader perspective is motivated by my interest with LOUD specifications as a research area, particularly because of their notable data agnosticism and as it resonated with @hyvonen_cultural_2012 [pp. 1-3]'s subdivision of CH as well. These services have the adaptability to process and use different types of data, transcending the boundaries of specific domains or disciplines. Although grounded in concrete CH cases, their potential to extend to any type of data, including those from STEM, is a compelling prospect that warrants further exploration, a point that I will explore later. The following sub-subsections aim to briefly discuss tangible, intangible, and natural heritage, as well as providing a definition of CH data which can serve as a foundational reference for this thesis. 3.1.1.1 Tangible Heritage Tangible CH encompasses physical artefacts and sites of immense cultural significance that are passed through generations in a society [@vecco_definition_2010]. These objects are tangible manifestations of human creativity, representing artistic creations, architectural achievements, archaeological remains as well as collections held by CHIs. One aspect of tangible CH is artistic creations such as paintings, sculptures and traditional handicrafts. These artefacts embody cultural values and artistic expressions and serve as essential reflections of a society’s collective ethos. For example, artworks such as ‘Irises’ from Vincent van Gogh[14] and Alberto Giacometti’s ‘L’Homme qui Marche I’ [15] are revered works of art that have deep cultural significance in Europe and all over the world. The built heritage, including monuments, temples and historic buildings, is another important component of the tangible CH. These architectural marvels not only represent past civilisations, but also convey the social values and aspirations of their time. The Taj Mahal, an exemplary white marble structure in India, stands as a poignant testament to Mughal architecture. Closer to where I write this dissertation one can mention the Abbey of St Gall, a convent from the century which is inscribed on the UNESCO World Heritage List. In the context of urban heritage, conventional definitions of built heritage often focus narrowly on the architectural and historical value of individual buildings and monuments, which are well protected by existing legislation. However, the challenge is to preserve urban fragments - areas within towns and cities that may not qualify as designated conservation areas, but are of significant cultural and morphological importance [@tweed_built_2007]. For instance, [@rautenberg_lemergence_1998] proposes two categories of built CH: heritage by designation and heritage by appropriation. Heritage by designation involves experts conferring heritage status on sites, buildings, and cultural objects through a top-down approach, often without public participation. This method can be predictable and uncontroversial, but can be criticised for being elitist and neglecting unconventional heritage. On the other hand, heritage by appropriation emphasises community and public involvement in identifying and preserving cultural expressions, leading to a more inclusive and dynamic understanding of heritage. Archaeological sites are also an integral part of the tangible CH, offering invaluable insights into past societies and ways of life. As per May 2024, UNESCO's long list of World Heritage Sites includes 1,199 cultural and natural sites in 168 different state parties — including 48 sites in transboundary regions[16]. Sites such as Machu Picchu, an impressive Inca citadel in the Peruvian Andes, bear witness to the architectural achievements and cultural practices of ancient civilisations. If archaeological sites are invaluable, they face significant threats such as looting, destruction, exploitation, and extreme weather phenomena [@bowman_transnational_2008; @micle_archaeological_2014]. To safeguard them, conservation efforts must be case-specific and include documentation and assessment of experiences gained [@aslan_protective_1997]. The preservation of tangible CH extends beyond physical objects to include libraries, archives and museums that house collections of books, manuscripts, historical documents and artefacts. Incidentally, the term “cultural property” is also employed as a related concept to tangible CH, encompassing both movable and immovable properties as opposed to less tangible cultural expressions [@ahmad_scope_2006]. Cultural property is protected by a number of international conventions and national laws. For instance, the Blue Shield[17] — an international organisation established in 1996 by four non-governmental organisations[18] — aims to protect and preserve heritage in times of armed conflict and natural disasters [@van_der_auwera_unesco_2013]. Its mission has been revised in 2016: The Blue Shield is committed to the protection of the world’s cultural property, and is concerned with the protection of cultural and natural heritage, tangible and intangible, in the event of armed conflict, natural- or human-made disaster. [@blue_shield_blue_2016 art. 2.1] Overall, tangible CH is a testament to human ingenuity and cultural diversity, and serves as a bridge between the past and the present. Its preservation is a collective responsibility, ensuring that the legacy of past generations endures and the wealth of cultural diversity continues to enrich the fabric of society. 3.1.1.2 Intangible Heritage The concept of intangible heritage emerged in the 1970s and was coined at the UNESCO Mexico Conference in 1982 [@leimgruber_switzerland_2010] with the aim of protecting cultural expressions that were previously excluded from preservation efforts [@hertz_politiques_2018]. UNESCO's previous focus had been on material objects, primarily from wealthier regions of the global North, leaving the intangible cultural heritage of the South overlooked. Attempts to protect intangible heritage through legal measures like copyright and patents were ineffective due to the collective nature of these cultural expressions and the anonymity of creators. The Convention acknowledges that intangible CH is essential for cultural diversity and sustainable development. Below is the definition given by the Convention for the Safeguarding of the Intangible Cultural Heritage: ‘The Intangible Cultural Heritage’ means the practices, representations, expressions, knowledge, skills – as well as the instruments, objects, artefacts and cultural spaces associated therewith – that communities, groups and, in some cases, individuals recognize as part of their cultural heritage. This intangible cultural heritage, transmitted from generation to generation, is constantly recreated by communities and groups in response to their environment, their interaction with nature and their history, and provides them with a sense of identity and continuity, thus promoting respect for cultural diversity and human creativity. [@unesco_basic_2022] According to UNESCO, intangible CH can be manifested in the following domains: oral traditions and expressions, including language as a vehicle of the intangible CH; performing arts; social practices, rituals and festive events; knowledge and practices concerning nature and the universe; traditional craftsmanship. Overall, intangible CH is a multifaceted concept that encompasses both traditional practices inherited from the past and contemporary expressions in which diverse cultural groups actively participate [@munjeri_tangible_2004; @leimgruber_was_2008]. It includes inclusive elements shared by different communities, whether they are neighbouring villages, distant cities around the world, or practices adapted by migrant populations in new regions. These expressions have been passed down from generation to generation, evolving in response to their environment, and play a crucial role in shaping our collective identity and continuity. Intangible CH promotes social cohesion, strengthens a sense of belonging and responsibility, and enables individuals to connect with different communities and society at large. Central to the nature of intangible CH is its representation within communities. Its value goes beyond mere exclusivity or exceptional importance; rather, it thrives on its association with the people who preserve and transmit their knowledge of traditions, skills and customs to others within the community and across generations. The recognition and preservation of intangible CH depends on the communities, groups or individuals directly involved in its creation, maintenance and transmission. Without their recognition, no external entity can decide on their behalf whether a particular practice or expression constitutes their heritage. The community-based approach ensures that intangible CH remains authentic and deeply rooted in the living fabric of society, protected by those who care for and perpetuate it. In Switzerland, the Winegrower’s Festival in Vevey (La Fête des Vignerons), a plurisecular event celebrating the world of wine making [@vinckMetiersOmbreFete2019] and the Carnival of Basel (Basler Fasnacht) [@chiquet_how_2023] are examples of traditions that are listed among UNESCO's intangible CH. (In)tangibility is not always a straightforward concept and can indeed be blurred, i.e. it goes beyond the mere idea of materialisation. Many artefacts and elements of CH possess both tangible and intangible qualities that intertwine and complement each other, making the distinction less clear-cut. For instance, this Male Face Mask, held at the Art Institute Chicago[19], also known as ‘Zamble’, from the Guro people in the Ivory Coast holds dual significance as both a tangible and intangible CH. As a tangible object, the mask is a physical artefact made from wood and pigment, fabric, and various adornments, that combines animal and human features representing the Guro people’s artistic skills. On the other hand, as an intangible cultural object, the Zamble mask carries profound spiritual and cultural meaning. It plays a significant role in commemorating the deceased during a man’s second funeral. These second funerals are organised months or even years after the actual burial as a way to honour and remember the departed [see @haxaire_power_2009]. Thus, the preservation and appreciation of both the tangible and intangible aspects of the mask are essential to its cultural relevance. Another example of the blurred line between tangible and intangible heritage is emphasised by @de_muynke_ears_2022 in recreating reported perceptions of the acoustics of Notre-Dame de Paris through a collaboration between sciences of acoustics and anthropology. The authors highlight the heritage value of how people subjectively perceive sound in a space, particularly in places of worship where sound and music are integral to the religious experience. The authors advocate integrating the study of both material and non-material aspects to understand the changing sonic environments of heritage buildings [@de_muynke_ears_2022 pp. 1-2]. @katz_digitally_2023 articulates that ‘acoustics is an intangible product of a tangible building’. This integrated perspective could lead to a more holistic understanding of the dynamics between physical spaces and the perceptual and experiential dimensions attached to them. 3.1.1.3 Natural Heritage Natural heritage, encompassing geological formations, biodiversity, and ecosystems of cultural, scientific, and aesthetic value, shares a significant overlap with CH. Many natural sites hold spiritual and symbolic importance for communities, becoming repositories of cultural memory and identity [@lowenthal_natural_2005]. Traditional ecological knowledge developed by various cultures also underscores the interconnectedness of cultural and natural heritage, as indigenous communities have accumulated wisdom on sustainable resource use and ecological balance [@azzopardi_what_2023]. Moreover, the conservation and sustainable management of natural heritage is often intertwined with efforts to protect CH, fostering a collective commitment to preserve these entangled legacies for future generations. The link between natural and CH goes beyond their shared values; spatial overlaps further accentuate their interdependence. Natural sites may have cultural significance, while CH sites may be situated within natural landscapes. For example, a national park may include archaeological sites or culturally revered landscapes, thus intertwining the cultural and natural dimensions. This spatial intermingling highlights the inextricable relationship between human societies and the natural environment, as cultural practices and beliefs become intertwined with the landscapes they inhabit. In this way, the preservation of both natural and cultural heritage becomes essential not only for their intrinsic worth but also for sustaining the narrative of our shared human and environmental history. Additionally, the distinction between nature and culture is not only subjective and dependent on human appreciation [@vandenhende_management_2017]. Rather, it is a concept intrinsically linked with the overarching framework of modernism, a perspective that has been critically examined and deconstructed by the influential sociologist and philosopher, Bruno Latour, that have argued that ‘we have never been modern’ [@latour_we_1993]. Latour’s deconstruction of the modernist perspective extends to the recognition that the ‘the proliferation of hybrids has saturated the constitutional framework of the moderns’ [@latour_we_1993 p. 51]. This assertion underscores the fundamental challenge posed by hybrid entities – those that blur the boundaries between nature and culture – to the traditional categories upon which modernist thinking has been predicated. In essence, the concept of hybrids disrupts the neat divisions between the natural and social worlds that have been a hallmark of modernist discourse and provide us an opportunity to situate ourselves as ‘amodern’ as opposed to postmodern [@latour_postmodern_1990]. In addition to Latour’s critique of the modernistic distinction between nature and culture, the concept of the ‘parasite’, as expounded by Michel Serres, one of the influential thinkers who significantly influenced Latour’s intellectual development [@berressem_deja_2015]. It offers a valuable lens through which to examine the intricacies of interconnectedness and interdependence within our world. In his view, everything is enmeshed in a complex web of relationships that negates the existence of self-contained entities. Rather than seeing discrete and isolated entities, Serres invites us to see everything as an integral part of a larger system in which each component is inextricably dependent on the others [@serres_parasite_2014]. Together, these complementary perspectives invite us to reevaluate our understanding of the intricate tapestry of existence, emphasising the complexities of our relationship with the world. Thus, the appreciation of nature and culture is not mutually exclusive, but rather forms a continuous and evolving relationship. The modern perspective has historically separated these realms, treating them as distinct and disconnected. However, a more inclusive approach dissolves this artificial boundary and recognises the interconnectedness of nature and culture [@haraway_encounters_2008; @haraway_staying_2016]. This paradigm shift challenges the traditional modern understanding and invites a more holistic view in which natural and cultural heritage are mutually constructed within a complex network of relationships. Recognition of this relationship is essential in the context of heritage conservation and understanding. The dynamic interplay between nature and culture is recognised, and the acknowledgement of their coexistence promotes a more holistic approach to heritage conservation, where cultural practices, traditions and ecological systems are seen as interdependent aspects of the wider heritage tapestry. This recognition encourages us to see heritage sites not as isolated entities, but as part of a larger web of interconnectedness, and urges us to conserve and value both cultural and natural heritage with a shared responsibility. Adopting this interconnected perspective enables us to appreciate the profound connections between human societies and the natural world, and inspires a collective commitment to safeguarding these precious legacies for future generations. 3.1.1.4 Cultural Heritage Data As I embark on the exploration of CH data, it is first necessary to establish a basic understanding of data in this context. At its core, data represents more than mere numbers and facts; it constitutes a collection of discrete or continuous values that are assembled for reference or in-depth analysis. In essence, data are the rich tapestry upon which the narratives of CH are woven, making its comprehension a critical prerequisite for our expedition into this domain. Luciano Floridi — a prominent philosopher in the field of information and digital ethics — provides a thorough perspective on the term ‘data’ and offers valuable insights into its fundamental nature in its PI. He perceives ‘data at its most basic level as the absence of uniformity, whether in the real world or in some symbolic system. Only once such data have some recognisable structure and are given some meaning can they be considered information’ [@floridi_information_2010]. This initial definition sets the stage for a deeper exploration of Floridi’s understanding of data, as he further focuses on its transformative journey into a more meaningful and structured form, which we will explore next. Building upon Floridi’s foundational concept of data as the absence of uniformity, his subsequent definition provides a more comprehensive perspective. In a previous work, @floridi_is_2005 [p. 357] argues that ‘data are definable as constraining affordances, exploitable by a system as input of adequate queries that correctly semanticise them to produce information as output’. This definition highlights the dynamic role of data, not only as raw entities awaiting structure and meaning but also as elements imbued with the potential to constrain and guide systems towards the generation of meaningful information. Transitioning from Floridi’s concept of data, we progress to the view that data can be notably seen as interpretable texts within the DH perspective. According to (Owens, 2011) @owens_defining_2011: there are four main perspectives on how Humanists can engage with data: Data as constructed artefacts: data are a product of human creation, not something inherently raw or neutral; Data as interpretable texts: Humanists can interpret data as authored works, considering the intentions of the creators and how different audiences understand and use the data; Data as processable information: data can be processed by computers, allowing various forms of visualisation, manipulation and analysis, which can lead to further perspectives and insights; Data can hold evidentiary value: data, as a form of human artefact and cultural object, can provide evidence to support claims and arguments. These considerations highlight the multifaceted nature of data within the field of DH. It is in this complex landscape that we recognise that data transcends its traditional role as a passive entity. As @rodighiero_mapping_2021 [p. 26, citing [@akrich_sociologie_2006]] suggests that ‘there is no doubt that data are full-fledged actors that take part in the social network the actor-network theory describes, in which both human and non-human intertwine and overlap’. This notion – rooted and borrowed from STS – reinforces the idea that data, as an active and dynamic entity, plays a significant role in shaping the interactions between human and non-human actors in any digital spheres. From these angles, I can look at the characteristics of CH data. @bruseker_cultural_2017 [p. 94] articulate that ‘data coming from the cultural heritage community comes in many shapes and sizes. Born from different disciplines, techniques, traditions, positions, and technologies, the data generated by the many different specializations that fall under this rubric come in an impressive array of forms’. In exploring CH data, it is important to recognise the inherent diversity stemming from diverse disciplines, techniques, and traditions. @bruseker_cultural_2017 [p. 94] aptly emphasise this, highlighting the extensive array of forms in which data manifests. This heterogeneity raises fundamental questions about the unity and identity of CH data — a crucial aspect deserving acknowledgement within this context. As the authors astutely ponder: It could be a natural problem to pose from the beginning: if the data of this community indeed presents itself in such a state of heterogeneity, does it not beg the question if there is truly an identity and unity to cultural heritage data in the first place? It could be argued that Cultural Heritage, as a term, offers a fairly useful means to describe the fuzzy and approximate togetherness of a wide array of disciplines and traditions that concern themselves with the human past. Expanding on these insights, CH data refer to digital or data-driven affordances of CH[20], embodying a rich and varied compilation of insights originating from a variety of disciplines, techniques, traditions, positions and technologies. It encompasses both tangible and intangible aspects of a society’s culture as well as natural heritage. These data, derived from a wide range of disciplines, offer a latent capacity to support the generation of knowledge relating to historical time periods, geospatial areas, as well as current and past human and non-human activities. They are collected, curated and maintained by various entities such as libraries, archives, museums, higher education institutions, non-governmental organisations, indigenous communities and local groups as well as by the wider public. Building further on the mosaic of CH data, three primary dimensions come to the fore: heterogeneity, knowledge latency, and custodianship. Heterogeneity: As a fundamental characteristic, signifies the diverse forms and origins that shape this invaluable reservoir of human heritage. Different techniques and varying viewpoints in treating modelling also contribute to this heterogeneity [@guillem_faire_2023]. Knowledge latency: It highlights the temporal dimension, presenting CH data as a repository of latent knowledge awaiting discovery and interpretation. Notably, not all artefacts are – or should be – digitised, and even among those that are, (mis)representation and challenges in interconnecting them persist [@rossenova_iterative_2022]. Besides, the issue of structured data – or the lack of it – reinforces the aspect of knowledge latency [@haciguzeller_emerging_2021]. Custodianship: This dimension reinforces the essential role played by a variety of entities, predominantly CHIs, in safeguarding and managing resources, ensuring their preservation and accessibility for present and future generations. However, it is very important to acknowledge the great divide in terms of resources, with indigenous and local communities often facing challenges in custodianship responsibilities. Taken together, these dimensions contribute to a comprehensive understanding of the nuanced fabric of CH data. They reveal the diversity of forms and origins, the temporal aspects and the responsible stewardship that are crucial to the sustainability of such data. By shifting our focus to the sphere of humanities data, we broaden our scope to extend beyond the peculiarities of CH data. Drawing parallels between these areas allows us to grasp the interconnectedness of our heritage. CH data usually refers to information about cultural artefacts, sites, and practices that hold historical or cultural significance. Humanities data encompasses information about human culture, history, and society, including literature, philosophy, art, and language [@tasovac_cultural_2020]. Both often involve ethical considerations, such as ownership, access, and preservation, and require a comprehensive understanding of their various meanings and values [@ioannides_towards_2019]. Moreover, @schoch_big_2013 explains that data in the humanities, such as text and visual elements, have unique qualities. While these analogue forms could be considered data, they lack the ability to be analysed computationally as they are non-discrete. The semiotic nature of language, text and art introduces dimensions tied to meaning and context, making the term ‘data’ problematic. Critics question its use because it conflicts with humanistic principles such as contextual interpretation and the subjective position of the scholar. @schoch_big_2013 distinguish data in the humanities further into two core types: smart and big data. The former tends to be small in volume, carefully curated, but harder to scale such as digital editions. As for the latter, it describes voluminous and varied data and it loosely relies on the three ⋁ by @laney_3d_2001: volume, velocity and variety (see 3.3.1.2). Yet, big data in the humanities differs significantly from other fields as it rarely requires rapid real-time analysis, is less focused on handling massive volumes, and instead deals with diverse, unstructured data sources. @schoch_big_2013 concludes by arguing that ‘I believe the most interesting challenge for the next years when it comes to dealing with data in the humanities will be to actually transgress this opposition of smart and big data. What we need is bigger smart data or smarter big data, and to create and use it, we need to make use of new methods’. Data processing offers great potential for humanities research as @owens_defining_2011 argues: ‘In the end, the kinds of questions humanists ask about texts and artifacts are just as relevant to ask of data. While the new and exciting prospects of processing data offer humanists a range of exciting possibilities for research, humanistic approaches to the textual and artifactual qualities of data also have a considerable amount to offer to the interpretation of data’. While the term ‘data’ in the context of the humanities may raise questions due to its semiotic and contextual complexities, it serves as a foundation for understanding both CH data and broader humanities data. The data originating from CH and the humanities are inherently intertwined, as they often share a similar nature and purpose for scholars. This strong interconnection leads to a collaborative relationship between the GLAM sector and the humanities or DH. Scholars in the humanities frequently rely on digitised cultural artefacts, historical records, linguistic resources, and literary works provided by GLAM institutions to gain valuable insights into human history, culture, and traditions. The digitisation efforts and research collaborations between these entities play a pivotal role in preserving CH data and advancing our understanding of diverse societies, fostering a deeper appreciation of our shared human heritage. CH data and humanities data are distinct from other scientific data due to their qualitative and subjective nature, which requires different methods of analysis than quantitative scientific data. They include archival and special collections, rare books, manuscripts, photographs, recordings, artefacts, and other primary sources that reflect the cultural beliefs, identity, and memory of a people [see @sabharwal_2_2015; @izu_sociocultural_2022]. In summary, while CH data and humanities data share some commonalities, they differ in terms of scope and subject matter. CH data focuses specifically on the preservation and documentation of physical artefacts and intangible attributes, while humanities data encompasses a broader range of disciplines within the humanities [@munster_digital_2019]. However, it is important to note that the distinction between CH data and humanities data can be blurred, as (meta)data should ideally be co-created and integrated across both domains. 3.1.2 Representation and Embodiment of Cultural Heritage Data Digital representation of CH data, while preserving their context and complexity, remain a significant challenge. Those representations, sometimes referred to as digital surrogates or digital twins [@conway_digital_2015; @shao_digital_2018; @semeraro_digital_2021], of CH data can potentially lead to a loss of context and a reduction in the richness of the CH represented. For instance, a digital image of a cultural artefact may not capture its materiality, such as its texture, weight, and feel, which are essential aspects of the artefact’s cultural significance [@force_context_2021]. Furthermore, digital representations may also exclude vital social, cultural, and historical contexts surrounding the object, which is crucial to understanding its full cultural value [@cameron_beyond_2007]. This subsection is structured around two key dimensions. Firstly, it explores materiality, highlighting how digital representations may fail to capture important aspects that are integral to understanding the significance of CH resources. Secondly, it navigates the convergence and divergence between digitised CH and digital heritage. 3.1.2.1 Materiality Briefly, materiality refers to the physical qualities of an object or artefact, such as its colour, texture, and composition. As part of built heritage, the emphasis for materiality relates primarily to architecture, its associated techniques and the range of materials used in the construction or renovation of a building. More specifically, materiality acts as a pivotal factor in the transformation of disparate fragments of material culture into heritage, providing a vital link to the intangible facets of heritage. It contributes significantly to an individual’s social position and ability to navigate specific social milieus, thereby determining their ability to transmit cultural knowledge and values to future generations. The transformative potential of materiality in this regard underscores its fundamental role in perpetuating heritage and the transmission of cultural legacies [@carman_where_2009]. The physical attributes of objects, including texture, colour and shape, can evoke different emotions and associations, shaping people’s perceptions and memories of these events. Beyond retrospective influences, the potential of materiality extends to the creation of new memories and meanings, as exemplified by the use of materials such as glass in contemporary art. In such cases, materials evoke not only their inherent properties but also symbolic connotations, adding new layers of meaning and memory to the artistic narrative [@fiorentino_persistence_2023]. @edwards_photographs_2004 [p. 3] argue that materiality is not just concerned with physical objects in a positivist sense, but also involves complex and fluid relationships between people, images, and things. This relationship is influenced by social, cultural, and historical contexts, and plays a crucial role in shaping our perceptions and experiences of the world. Moreover, materiality is central to giving meaning to non-human entities [see @latour_actor-network_1996; @haraway_companion_2003; @star_institutional_1989], which emphasises the role of both humans and non-humans in shaping social and cultural phenomena. For CH data, diversity is at its core, as it allows for the exploration of different ways of knowing, experiencing, and expressing the world. Therefore, it is important to approach materiality not as a static and fixed concept, but as a dynamic and evolving phenomenon that is shaped by multiple forces [@hahn_digitale_2018 pp. 62-63]. When discussing materiality, there is also its negation, i.e. the notion of space or emptiness, such as how people interact with it through built heritage, which is regarded as a primordial medium of material culture, as expounded by @guillem_rcc8_2023 [p. 2]: The most intuitive and foundational definition of architecture is the built thing, that is the architecture qua building or built work. Human beings continuously interact with the built materiality through the non-materiality of space. Space as emptiness is formed and defined by the materiality that affects its existence. That relation between fullness and emptiness is what makes possible architecture as lived and experienced space. Materiality also offers a means of challenging dominant narratives and power structures, particularly the Western-centric perspective on CH. It gives greater recognition to the importance of intangible CH, which often takes a back seat to tangible objects in dominant narratives [@lenzerini_intangible_2011]. By highlighting the materiality of marginalised or forgotten elements, individuals can reclaim their heritage and challenge dominant narratives that marginalise certain groups, contributing to a more inclusive and accurate representation of CH. The primary focus in terms of digitisation is also on preserving material-based knowledge, often overlooking the dynamic and living nature of intangibility. @hou_digitizing_2022 stress the crucial role of computational heritage and information technologies advances in preserving and improving access to intangible CH. Effectively documenting the ephemeral aspects of intangible heritage and communicating the knowledge that is deeply linked to individuals are pressing challenges. Recent initiatives seek to capture the dynamic facets of cultural practices, using visualisation, augmentation, participation and immersive experiences to enhance experiential narratives. There is a strong call for a strategic re-evaluation of the intangible CH digitisation process, emphasising the human body as a vessel for traditions and memories, such as capturing traditional Southern Chinese martial arts, who has been passed down colloquially from generations and needs a methodological approach to capture such embodied knowledge [see @adamou_facets_2023; @hou_ontology-based_2024]. Even in cases where considerable efforts have been devoted to digitisation of physical objects such as medieval manuscripts and rare books over the past few decades [@nielsen_digitisation_2008], a lingering concern persists regarding the authentic encounter with the original artefact, despite its enhanced accessibility through digital surrogates [@van_lit_digital_2020]. Material attributes present a persistent challenge to achieving full replication. Despite advances facilitated by techniques such as RTI, 3D digitisation, or VR and AR, which offer better experiential immersion and are more effective than two-dimensional representations in addressing certain materiality concerns, the ability to replicate the multifaceted sensory experience associated with the original object, including the palpable emotions and spatial sensation, remains an ongoing endeavour, presenting a complex and multifaceted dimension of a challenge that is quite unlikely and may never be fully feasible [see @endres_digitizing_2019]. 3.1.2.2 Digitised Cultural Heritage and Digital Heritage The concepts of digitised CH and digital heritage intersect through the use of digital technology for the preservation, access, and dissemination of CH resources. Digitised CH focuses on converting physical artefacts into digital forms, ensuring their long-term preservation and accessibility through digital means. Conversely, digital heritage includes a broader range of digital tools and resources ‘to preserve, research and communicate cultural heritage’ (@munster_digital_2021 p. 2, citing [@georgopoulos_cipas_2018]). Digitised CH acts as a critical bridge, facilitating a transition from traditional or analogue GLAM practices to a digital environment. This shift is pivotal in unlocking the potential of digitised CH. These values extend beyond scholarly pursuits, despite the majority of digitisation efforts being driven by research funding. In doing so, it becomes evident that the creative reuse and data-driven innovation stemming from digitised CH necessitate substantial and sustained investment in the GLAM sector. This investment is fundamental, especially amidst reduced funding due to years of austerity. @terras_value_2021 underscore this need, shedding light on the delicate balance required with commercial outcomes. They emphasised that leveraging CH datasets offers vast opportunities for technological innovation and economic benefits, urging professionals from various domains to collaborate and experiment in a low-risk environment. Digital heritage[21] encompasses a wide range of human knowledge and expression in cultural, educational, scientific and various other domains. In today’s rapidly evolving technological landscape, an increasing amount of this knowledge is either digitally created or in the process of being converted from analogue to digital formats [@he_digital_2017]. These digital resources cover a wide range, including text, multimedia, software and more, and require deliberate and strategic management to ensure their long-term preservation. This valuable heritage, spread across the globe and expressed in multiple languages [@unesco_charter_2009]. In summary, digitised CH not only forges the path to digital heritage but also embodies an ever-evolving cultural landscape. Recognising the transformative potency with digital heritage is essential to enriching our understanding and engagement with our cultural roots. Both concepts are intimately embedded in CH and play a vital role as conduits. 3.1.3 Collectives and Apparatuses The collaborative efforts of collectives and the operation of various apparatuses play a fundamental part in shaping the preservation, interpretation and dissemination of cultural artefacts and practices. This subsection is concerned with the central contributions of human and non-human actors engaged in cooperative action and the modus operandi of various apparatuses, such as building (digital) infrastructures. Some of these considerations are drawn from STS, which are more fully captured in , serving as the theoretical framework for the thesis. Bruno Latour’s concept of the importance of collectives and apparatuses [see @latour_habiter_2022 p. 15] can be extrapolated to CHIs. Every institution’s or project’s ultimate success hinges on the collaboration and support of individuals, as well as the tools, systems and technologies they use. Indeed, paralleling CHIs with wider contexts suggests that collective efforts and apparatuses play a critical role in shaping the effectiveness of any institution. This highlights the importance of recognising the influence of both human and non-human entities in institutional functioning and underlines the need for a more comprehensive understanding of the dynamics involved therein. ANT can be a useful lens to analyse the creation, use, and dissemination of CH data. ANT posits that actors are not independent entities but are instead part of a network that consists of both human and non-human entities. According to ANT, every actor, be it a person or a technology, is a node in the network and contributes to the overall functioning of the network [@latour_reassembling_2005; @callon_actor_2001]. When we apply this framework to CHIs, we can identify the different actors involved in the creation, use, and dissemination of CH data. These actors can include individuals, such as curators, conservators, and historians, as well as non-human entities, such as databases, digitisation equipment, and software. Moreover, this approach can help us understand the interactions between these actors and how they shape the overall functioning of CHIs. For instance, digitisation equipment can enable the creation of high-quality digital images of artefacts, which can then be disseminated globally through online platforms. Examining the Notre-Dame de Paris, one can discern the keystones at the summit of its arches as indispensable actors within its architectural narrative. These keystones, imbued with historical narratives and a non-human facet, played a central role in the (digital) rescue and subsequent restoration efforts following the tragic roof fire in April 2019. @guillem_faire_2023’s study further elucidates this restoration journey, emphasising how the keystones, with their individual narratives and structural significance, contributed to the (digital) reassembly. Building on this perspective, we can explore the importance of community involvement in the preservation and management of CH data, thereby increasing the potential for sustainable practices and inclusive engagement. Local communities have an integral part to play in the management and preservation CH data, especially in the digital age where resources are often scarce for GLAM institutions. Community involvement has several benefits, including increased engagement and participation, access to local knowledge and expertise, and more sustainable and inclusive management and preservation practices [@ridge_12_2021]. For instance, geophysical technologies such as ground-penetrating radar have been used with great success in identifying and evaluating the depth, extent, and composition of CH resources for research and management purposes, easing tensions when working with sensitive ancestral places [@nelson_role_2021]. Collaborative environments can also help with CH information sharing and communication tasks because of the way in which they provide a visual context to users, making it easier to find and relate CH content [@respaldiza_hidalgo_metadata_2011]. Embarking on @brown_communities_2023 [pp. 6-7]'s insightful analysis, a prominent illustration of exemplary community practice can be found in the sphere of community museums in Latin America: Inicio - Museos Comunitarios de América[22]. The author highlights the role of community engagement and leadership in the creation and operation of these museums. Such engagement ensures that these museums are not imposed from outside, but rather emerge organically as museums the community, resonating with its unique CH and identity. This approach is consistent with the ethos of ‘telling a story’, building a future, which embodies a deep commitment to community empowerment and cultural preservation. This community-centric approach amplifies the museum’s resonance with the community’s lived experiences and historical narratives. At the same time, institutions can also benefit from collaborating with peer communities like IIIF to promote greater access to their collections. IIIF provides a set of open standards for delivering high-quality digital objects online at scale, which can help memory and academic institutions share their collections with each other and with the wider public [@snydman_international_2015; @weinthal_iiif_2019]. By adopting IIIF standards, organisations can make their collections more discoverable and accessible to researchers, developers, and other CH professionals [@padfield_joseph_practical_2022]. Involvement in communities such as IIIF also helps to mitigate costs as they develop shared or adaptable resources and services [@raemy_international_2017]. Participation of communities in the management and preservation of CH resources is essential to ensure that CH is protected and accessible for future generations. By involving and participating in communities, GLAMs can tap into local as well as peer knowledge and expertise, making management and preservation practices more sustainable and inclusive. This approach also increases engagement and participation, ensuring that CH is valued and appreciated by the wider community. Thus, memory institutions need to collaborate closely with communities to ensure that CH data, and their underlying infrastructures and services, is being effectively curated [@delmas-glass_fostering_2020]. Closely related to this context, @star_ethnography_1999 points out the often unacknowledged role of infrastructure within society. She argues that infrastructures are necessary but often invisible and taken for granted: People commonly envision infrastructure as a system of substrates – railroad, lines, pipes and plumbing, electrical power plants, and wires. It is by definition invisible, part of the background for other kinds of work. It is ready-to-hand. This image holds up well enough for many purposes – turn on the faucet for a drink of water and you use a vast infrastructure of plumbing and water regulation without usually thinking much about it. [@star_ethnography_1999 p. 380] @star_ethnography_1999 [pp. 381-382, citing [@star_steps_1994]] identifies nine dimensions to define infrastructure. They provide a comprehensive framework to comprehend the nuanced nature of infrastructure and its pervasive impact on diverse societal facets. The following dimensions are vital for analysing the often imperceptible, yet deeply embedded structures that constitute the foundational framework of both daily life and broader societal operations[23]: Embeddedness: Infrastructure is sunk into and inside of other structures, social arrangements, and technologies. People do not necessarily distinguish the several coordinated aspects of infrastructure. Transparency: Infrastructure is transparent to use, in the sense that it does not have to be reinvented each time or assembled for each task, but invisibly supports those tasks. Reach or scope: This may be either spatial or temporal – infrastructure has reach beyond a single event or one-site practice. Learned as part of membership: Strangers and outsiders encounter infrastructure as a target object to be learned about. New participants acquire a naturalised familiarity with its objects, as they become members. Links with conventions of practice: Infrastructure both shapes and is shaped by the conventions of a community of practice. Embodiment of standards: Modified by scope and often by conflicting conventions, infrastructure takes on transparency by plugging into other infrastructures and tools in a standardised fashion. Built on an installed base: Infrastructure does not grow de novo; it wrestles with the inertia of the installed based and inherits strengths and limitations from that base. Becomes visible upon breakdown: The normally invisible quality of working infrastructure becomes visible when it breaks: the server is down, the bridge washes out, there is a power blackout. Is fixed in modular increments, not all at once or globally: Because infrastructure is big, layered, and complex, and because it means different things locally, it is never changed from above. Changes take time and negotiations, and adjustment with other aspects of the systems are involved. An appreciation of these dimensions is crucial to the analysis of the network of infrastructural systems that underpin contemporary society, and is necessary for the analysis of any digital infrastructure that manages CH data. Digital infrastructures – also known as e-infrastructures or cyberinfrastructures – are forms of infrastructure that are essential for the functioning of today’s society [see @jackson_understanding_2007; @ribes_sociotechnical_2010]. These kinds of infrastructure need to be understood as socio-technical systems, showcasing the interplay between technological components (such as hardware, software, and networks) and the social and organisational contexts in which they operate [@star_steps_1994]. According to @fresa_data_2013 [p. 33], digital CH infrastructures should be able to serve the research needs of humanities scholars as well as having dedicated services for education, learning, and general public access. In terms of requirements, @fresa_data_2013 [pp. 36-39] identifies three different layers of services: for content providers, for managing and adding value to the content, and for the research communities. For the latter, several sub-services tailored to research communities are listed. These encompass long-term preservation, PIDs[24], interoperability and aggregation, advanced search, data resource set-up, user authentication and access control, as well as rights management. Overall, (digital) infrastructures are imperative apparatuses in preserving and sharing CH data. First, they support preservation by archiving digital artefacts and their metadata, protecting them from deterioration and loss. Secondly, these infrastructures facilitate accessibility, allowing a global audience to explore and appreciate cultural heritage online. Finally, they encourage interpretation and engagement, promoting cross-cultural understanding and knowledge sharing. Moreover, infrastructure is a fundamental component that demands extensive investment, particularly in the creation of streamlined integration layers capable of interacting seamlessly with different systems. This can be exemplified by such institutions as the Rijksmuseum[25] , where a well-constructed infrastructure allows for efficient integration and interaction with various technological and organisational systems [@dijkshoorn_building_2023]. This investment serves as the foundation for an institution’s functionality, allowing for the smooth flow of data, the coordination of processes and the optimal use of resources. In a similar vein, @canning_power_2022 argue that the often invisible structures of metadata, particularly in Linked Data ontologies, play a crucial role in shaping the interpretation of data. These structures, while not immediately apparent, are imbued with value judgements and ideological implications, extending the impact of metadata beyond mere technicalities to encompass diverse and inter-sectional perspectives. This multidimensional ontological approach addresses the complexity and diversity of data sources, paralleling the need for sophisticated infrastructures in institutions like the Rijksmuseum. It underscores the importance of integrating inter-sectional feminist principles in information systems, reflecting a commitment to diverse ways of knowing and nuanced storytelling. Furthermore, as all (meta)data requires storage, it raises an important concern in terms of the entrenched power dynamics governing knowledge representation within information systems, as pointed out by @canning_what_2023. This perspective, initially centred around museum objects, holds broader implications for all CH resources [see @simandiraki-grimshaw_what_2023]. Canning strongly advocates for the essential adaptation of databases to embrace a diverse array of epistemological approaches by introducing new types of affordances. Databases, despite their role in information preservation, wield significant influence that can inadvertently stifle diverse modes of knowledge interpretation and ‘can constrain ways of knowing’. Furthermore, she compellingly argues that modifications to databases extend beyond technical adjustments; they are inextricably linked to shifts in institutional power dynamics and the enduring, often inequitable, power dynamics governing the world of museums – or any CHIs – and their curation. In understanding the interplay of collectives and apparatuses, it is clear that key actors, including individuals, institutions, local and global communities, as well as the sophisticated fabric of (digital) infrastructures and their components, are deeply entangled and interconnected. These entities, both human and non-human, collectively shape and navigate the rich networks of human interactions and technologies that underpin the foundations of contemporary society. 3.2 Cultural Heritage Metadata This subsection offers insights into the importance of metadata in CH, underlining its role in enhancing the understanding and accessibility of cultural artefacts. It is structured into three four[26] essential parts. I start with an introductory segment in 3.2.1, then I explore the types and functions of metadata in 3.2.2, thirdly in 3.2.3, I outline some of the most important CH metadata standards, and finally in 3.2.4, I explore the use of KOS, such as generic classification systems and controlled vocabularies. 3.2.1 Data about Data For curating CH resources, metadata[27], ‘data about data’, is probably one of the key concept that needs to be introduced here. Metadata permeate our digital and physical landscapes, playing a vital role in organising, describing and managing a vast array of information. Rather than being confined to a specific domain, they are ubiquitous and pervade many aspects of our everyday lives [@riley_understanding_2017 pp. 2-3]. From websites and databases to social media platforms and online marketplaces, metadata adds meaning to data, enabling users to understand their context, relevance and provenance. As an example, Figure 3.1 shows the metadata of a book[28]. Figure 3.1: Snapshot from the Swisscovery Platform Showing the Bibliographic Record of @zeng_metadata_2022 Metadata are central to the management and preservation of CH data, providing essential information to ensure that data can be properly organised, discovered and retrieved. For example, they facilitate the understanding and interpretation of data, enabling scholars and the public to access and use them effectively [@constantopoulos_aspects_2008]. Metadata also help to ensure the long-term preservation and accessibility of CH data [@zeng_metadata_2022 pp. 490-491]. Providing metadata in a structured manner facilitates forms of aggregation, i.e. individuals and institutions being able to harvest and organise metadata from multiple sources or repositories into a centralised location [see @freire_survey_2017; @freire_metadata_2021]. In addition, the importance of metadata as a gateway to information is particularly compelling when the primary embodiment of a record is either unavailable or lost. In cases where resources, time constraints, sensitive content or strategic decisions prevent the digitisation of an item, metadata becomes the principal means of representation and access. If a physical record is lost or damaged, the metadata associated with that record acts as a proxy for the record. @riley_understanding_2017 [p. 5] discusses the transformation of libraries over time. Initially, libraries moved from search terminals to the modern web-based resource discovery systems we use today. This shift was driven by advances in computerisation. Libraries’ basic approach to metadata is ‘bibliographic’, deeply rooted in their traditional expertise in describing books. This approach involves providing detailed descriptions of individual items so that users can easily locate them within the library’s collection. On the other hand, archives use ‘finding aids’, which are descriptive inventories of their collections, coupled with historical context. These aids are essential for users to understand the material and to find groups of related items within the archive. The metadata used in archives allows for the contextualisation of materials, particularly papers of individuals or records of organisations, providing a richer understanding of the content. Similarly, museums actively manage and track their acquisitions, exhibitions and loans through metadata. Museum curators use metadata to interpret collections for visitors, explaining the historical and social significance of artefacts and describing the relationships and connections between different objects. This helps to enhance the overall visitor experience and understanding of the artefacts on display or the digital resources on a particular website. 3.2.2 Types and Functions CHIs share common objectives and concerns related to information management, as highlighted by @lim_metadata_2011 [pp. 484-485]. These goals typically include facilitating access to knowledge and ensuring the integrity of CH data. However, it is important to note that CHIs also differ widely in how they deal with metadata. Different domains have unique approaches and standards for describing the materials they collect, preserve and disseminate, and even within a single domain there are significant differences. There have been different attempts to categorise the metadata landscape. For instance, @baca_setting_2016 identified the following five categories of metadata and their respective functions: Administrative: Metadata used in managing and administering collections and information resources, such as acquisition and appraisal information or documentation related to repatriation. Descriptive: Metadata used to identify, authenticate, and describe collections and related trusted information resources. Finding aids, cataloguing records, annotations by practitioners and end users, as well as metadata generated by or through a given DAM system can often be classified as descriptive metadata. Preservation: Metadata related to the preservation management of collections and information resources. Common examples of preservation metadata are documentation of physical condition of resources or of any actions taken to preserve resources, whether physical restoration or data migration. Technical: Metadata related to how a system functions or metadata behaves. Examples include software documentation and digitisation information. Use: Metadata related to the level and type of use of collections and information resources, such as circulation records, search logs, or rights metadata. Meanwhile @riley_seeing_2009, as illustrated in a comprehensive visualisation graph[29], suggested seven functions, i.e. the role a standard play in the creation and storage and metadata, and seven purposes referring to the general type of metadata. Functions: Conceptual Model, Content Standard, Controlled Vocabulary, Framework/Technology, Markup Language, Record Format, and Structure Standard. Purposes: Data, Descriptive Metadata, Metadata Wrappers, Preservation Metadata, Rights Metadata, Structural Metadata, and Technical Metadata. Almost a decade later, @riley_understanding_2017 [pp. 6-7] summarised metadata types into four groupings instead of the seven purposes previously mentioned. is removed from the list and technical, preservation and rights metadata are now grouped into a newly created administrative metadata category. Descriptive metadata: For finding or understanding a resource Administrative metadata: Umbrella term referring to the information needed to manage a resource or that relates to its creation 2.1 Technical metadata: For decoding and rendering files 2.2 Preservation metadata: Long-term management of files 2.3 Rights metadata: Intellectual property rights attached to content Structural metadata: Relationships of parts of resources to one another Markup Language: Integrates metadata and flags for other structural or semantic features within content[30]. This classification of metadata types and function differs to the categories identified by @baca_setting_2016 mostly due to the addition of structural metadata and markup language as their own categories [@zeng_metadata_2022 p. 19]. Table 3.1 lists the major types of metadata according to @riley_understanding_2017 [p 7] and include example properties and their primary uses. Table 3.1: Types of Metadata According to @riley_understanding_2017 [p. 7] Metadata (Sub)type Example properties Primary uses 1. Descriptive metadata Title, Author, Subject, Genre, Publication date Discovery, Display, Interoperability 2.1 Technical metadata File type, File size, Creation date, Compression scheme Interoperability, Digital object management, Preservation 2.2 Preservation metadata Checksum, Preservation event Interoperability, Digital object management, Preservation 2.3 Rights metadata Copyright status, Licence terms, Rights holder Interoperability, Digital object management 3. Structural metadata Sequence, Place in hierarchy Navigation 4. Markup languages Paragraph, Heading, List, Name, Date Navigation, Interoperability Ultimately, metadata can also be leveraged to create more inclusive and diverse representations of CH. For instance, metadata can be used to document and promote underrepresented communities and their heritage, providing greater visibility and recognition. This approach aligns with the principles of decolonising CH, promoting equity and social justice by recognising and valuing diverse cultural perspectives, especially in the prevailing anglophone and Western-centric standpoint in DH [@mullaney_internet_2021; @mahony_cultural_2018]. Moreover, the distinction between data and metadata, as discussed in the work of @alter_view_2023, is not always distinct, leading to the concept of ‘semantic transposition’. This complexity reflects in CH where what is considered metadata in one context might be primary data in another, underscoring the necessity for adaptable frameworks in data management. This understanding is crucial for fostering inclusive and diverse representations in CH, ensuring that all cultural narratives are appropriately documented and acknowledged. 3.2.3 Standards Metadata standards play a crucial role in ensuring that data are organised and consistent, facilitating mutual understanding between different stakeholders [@raemy_enabling_2020]. CHIs such as GLAMs typically follow established conventions or standards when organising their resources. Current methods of cataloguing have historical roots dating back to the century, particularly with the development of cataloguing systems such as Antonio Panizzi’s at the British Museum and Charles Coffin Jewett’s efforts to mechanically duplicate entries at the library of the Smithsonian Institution [@zeng_metadata_2022 pp. 14-15]. Unique metadata standards, rules and models have been established and maintained within specific sub-fields. In addition, certain standards for information resources have been endorsed by authoritative bodies [@greenberg_understanding_2005], and some are used exclusively within specific domain communities [@hillmann_metadata_2008]. @riley_understanding_2017 [p. 5] underscores the predilection of CH metadata – whether these standards emanate from libraries, archives, or museums – toward accentuating descriptive attributes. The foundational CH metadata standards, primarily conceived to [@zeng_metadata_2022 p. 11], manifest this thematic focus. Within the CH domain, metadata standards vary widely in scope, and a number of different standards have been developed to meet different needs and priorities[31] [@freire_availability_2018]. The following quoted passage sheds some light on the different approaches and levels of collaboration in metadata standardisation, namely among the library and museum sectors. Despite the striving for homogeneity, in practice, the production of metadata among information specialists and the use of metadata standards is already marked by considerable diversity. This has come about for very pragmatic reasons. Different types of objects and collections require different types of metadata. The curatorial interest for particular information differs for example between images held in an art gallery and a library, as does the information specialists’ domain expertise. Accordingly, diversity in metadata practice seems to be greatest in museums as they are the institutions that govern the most diverse collections. While the library sector has ‘systematically and cooperatively created and shared’ metadata standards since the 1960s, the museum sector, mostly handling images and objects, has been slower to establish such collaboration and consensus. [@dahlgren_diversity_2020 p. 244] In this context, I want to focus on some metadata standards that have proved vital across libraries, archives, museums and galleries. These standards, which I will briefly describe, serve as the foundation for organising, describing, and enabling efficient access to vast and diverse collections. Of particular interest I will be taking a closer look at CIDOC-CRM as it serves as the cornerstone of Linked Art, a fundamental LOUD standard. 3.2.3.1 Library Metadata Standards In libraries, several metadata standards have played crucial roles in organising and accessing collections over the years. The most prevalent historical standard, MARC[32], was a pilot project from the 1960s funded by the CLIR and led by the LoC to structure cataloguing data and distribute them through magnetic tapes [@avram_marc_1968 p. 3]. The standard evolved into MARC21 in 1999 [@zeng_metadata_2022 p. 418] – as exemplified by Code Snippet 3.1, providing a structured format for bibliographic records and related information in machine-readable form. It uses codes, fields, and sub-fields to structure data. Another significant historical standard is the AACR, published in 1967 and revised in 1978 that provides sets of rules for descriptive cataloguing of various types of information resources. Code Snippet 3.1: MARC21 Record of @zeng_metadata_2022 in the Swisscovery Platform leader 01424nam a2200397 c 4500 001 991170746542405501 005 20220427104002.0 008 210818s2022 xxu b 001 0 eng 010 ##$a 2021031231 020 ##$a9780838948750 $qBroschur 020 ##$a0838948758 035 ##$a(OCoLC)1264724191 040 ##$aDLC $bger $erda $cDLC $dCH-ZuSLS UZB ZB 042 ##$apcc 050 00$aZ666.7 $b.Z46 2022 082 00$a025.3 $223 082 74$a020 $223sdnb 100 1#$aZeng, Marcia Lei $d1956- $4aut $0(DE-588)136417035 245 10$aMetadata $cMarcia Lei Zeng and Jian Qin 250 ##$aThird edition 264 #1$aChicago $bALA Neal-Schuman $c2022 300 ##$axxvi, 613 Seiten $bIllustrationen 336 ##$btxt $2rdacontent 337 ##$bn $2rdamedia 338 ##$bnc $2rdacarrier 504 ##$aIncludes bibliographical references and index 650 #0$aMetadata 650 #7$aMetadata $2fast $0(OCoLC)fst01017519 650 #7$aMetadaten $2gnd $0(DE-588)4410512-5 776 08$iErscheint auch als $nOnline-Ausgabe $tMetadata $z9780838937969 776 08$iErscheint auch als $nOnline-Ausgabe $tMetadata $z9780838937952 700 1#$aQin, Jian $d1956- $4aut $0(DE-588)1056085541 856 42$3Inhaltsverzeichnis $qPDF $uhttps://urn.ub.unibe.ch/urn:ch:slsp:0838948758:ihv:pdf 900 ##$aOK_GND $xUZB/Z01/202203/klei 900 ##$aStoppsignal FRED $xUZB/Z01/202203 949 ##$ahttps://urn.ub.unibe.ch/urn:ch:slsp:0838948758:ihv:pdf AACR is no longer maintained and was replaced by RDA[33] around 2010 to be a more adaptive standard to contemporary needs. RDA, while not a markup language like MARC, serves as a content standard that guides the description and discovery of resources, focusing on user needs and facilitating improved navigation of library collections. Its goal is to provide a flexible and extensible framework for the description of all types of resources, ensuring discoverability, accessibility, and relevance for users[34] [@sprochi_where_2016 p. 130]. Libraries often leverage other standards to enrich their metadata practices. MODS[35], introduced in 2002, offers a more flexible XML-based schema for bibliographic description, allowing for better integration with other standards and systems. It was initially developed to carry [@zeng_metadata_2022 p. 423]. MODS provides a balance between human readability and machine processing, making it suitable for a wide range of resources and use cases [@guenther_mods_2003 p. 139]. METS[36], on the other hand, is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. METS, developed as an initiative of the DLF, provides a flexible and extensible framework for structuring metadata, allowing for the packaging of complex digital objects [@cantara_mets_2005 pp. 238-239]. While MODS is primarily concerned with bibliographic information, METS focuses on structuring metadata for digital objects, making it particularly useful for digital libraries and repositories. A further important standard is FRBR, a conceptual framework for understanding and structuring bibliographic data and access points. Originally developed by IFLA in 1997 as part of its functional requirements family of models, FRBR describes three main groups of entities, relationships, and attributes as illustrated by Figure 3.2. The first group of entities are the foundation of the model which characterises four levels of abstraction: WEMI [@denton_functional_2006 p. 231]. FRBR has had a significant impact on the development of RDA, which is loosely aligned with the principles and structures defined by the conceptual framework, but as it isn’t a data model per se; it does not inform how to record bibliographic information in day-to-day practice and focus heavily on textual resources[37] [@sprochi_where_2016 pp. 130-131]. Furthermore, @cossham_models_2017 [p. 11] asserts that FRBR and RDA, ‘don’t align well with the ways that users use, understand, and experience library catalogues nor with the ways that they understand and experience the wider information environment’. Figure 3.2: The FRBR Conceptual Framework. Adapted from @zou_constructing_2018 [p. 36] A further important standard in the field of library science is the LRM, which was introduced as a comprehensive conceptual framework. It provides a broad understanding of bibliographic data and user-centric design principles, aligning with FRBR. LRM defines key entities, attributes, and relationships important for bibliographic searches, interpretation, and navigation – as shown in Figure 3.3. It operates at the conceptual level and does not dictate data storage methods. Attributes in LRM can be represented as literals or URIs. The model is presented in a structured document format to support LOD applications and reduce ambiguity. During its development, a parallel process created FRBRoo (see 3.2.3.3), a model that extends the original FRBR model by incorporating it into CIDOC-CRM. FRBRoo focuses on CH data and is more detailed than LRM, which is designed specifically for library data and follows a high-level, user-centric approach [@riva_ifla_2017 pp. 9-13]. The LRM model, known as LRMer[38], was released in 2020 by IFLA [@zeng_metadata_2022 p. 163]. Figure 3.3: Overview of Relationships in LRM [@riva_ifla_2017 p. 86] BibFrame[39] is another metadata standard in the library domain. It was initiated around 2011 by the LoC to be a successor of MARC, which had become obsolete [see @tennant_marc_2002] as well as being invisible to web crawlers and search engines preventing adequate discoverability of bibliographic resources [@sprochi_where_2016 p. 132]. BibFrame is a loosely RDF-based model [@sanderson_linked_2015], intending to improve the interoperability and discoverability of library resources. While the BibFrame model may not perfectly correspond with the WEMI entities outlined in FRBR, it is possible to effectively link BibFrame resources to FRBR entities, ensuring their compatibility [@sprochi_where_2016 p. 133]. BibFrame aims to transition from MARC by providing a more web-friendly framework, focusing on the relationships between entities, improving data sharing, and accommodating the digital environment. Conversely, @edmunds_bibframe_2023 argues that BibFrame is unaffordable and leads to elitism within libraries, with the main beneficiaries being well-funded institutions, particularly in North America, while placing a financial burden on others. This approach, endorsed by bodies such as the LoC, is criticised for its high cost, impracticality, inequity and limited benefits for cataloguers, libraries, vendors and the public they serve. In addition, the author highlights BibFrame's lack of user friendliness, regardless of the intended users, and criticises the notion of adopting Linked Data for its own sake without substantial practical benefits. 3.2.3.2 Archival Metadata Standards For archives, metadata standards like EAD[40] and ISAD(G)[41] have been pivotal. EAD, introduced in the mid-1990s – it originated in 1993 and the first version of EAD was released in 1998, provides a hierarchical structure for representing information about archival collections, offering comprehensive descriptions that aid researchers, archivists, and institutions in managing and providing access to archival records. Its goal is to create a standard for encoding finding aids to improve accessibility and understanding of archival collections [@pitti_encoded_1999 pp. 61-62]. On the other hand, ISAD(G), released in its first version in 1994 by ICA, offers a more general international standard for archival description, providing a framework for describing all types of archival materials, including fonds, sub-fonds, series, files, and items [@shepherd_application_2000 p. 57]. ISAD(G) aims to establish consistent and standardised archival description practices on a global scale, facilitating the sharing and exchange of archival information. PREMIS[42], is another metadata standard that was initially released in 2005 – version 3.0 is the latest specification, published in 2016 – and focuses on the preservation of digital objects, consisting of four interrelated entities: Object, Event, Agent, and Rights [@caplan_practical_2005 p. 111]. The main objective of PREMIS is to help institutions ensure the long-term accessibility of data by capturing key details about their creation, format, provenance, and preservation events. It is seen as an elaboration of OAIS, which categorises information required for preservation in several functional entities and types of information package [see @bates_open_2009 pp. 425-426] – as illustrated by Figure 3.4, expressed through the mapping of preservation metadata onto the conceptual model [@zeng_metadata_2022 pp. 493-494]. Figure 3.4: OAIS Functional Model Diagram by @mathieualexhache_oais_2021 The latest development in metadata standards for archives is the creation of RiC, which has been developed since 2012 by ICA [@clavaud_ica_2021 pp. 79-80]. RiC is structured into four complementary parts [@ica_expert_group_on_archival_description_records_2023 p. 1] intended to cover and replace existing archival standards such as ISAD(G): RiC Foundations of Archival Description: A brief description of the foundational principles and purposes of archival description. RiC Conceptual Model: A high-level framework for archival description[43], as shown in Figure 3.5. RiC-O: The ontology[44], which embodies a specific implementation of the conceptual model. It is formally expressed in OWL to make archival description available using LOD techniques – which facilitating extensions [see @mikhaylova_extending_2023] – and adheres to a conceptual vocabulary specific to archival description. It provides the ability to navigate and interpret complex archival holdings and foster meaningful research and discovery. The ontology includes seven main groups of entities: Record, Agent, Rule, Event, Date, Place, and Instantiation. RiC Application Guidelines: A part in development at the time of writing which will provide practitioners and software developers with guidance and examples for implementing the conceptual model and the ontology in records and archival management systems. Figure 3.5: Global Overview of the Core Entities Defined by the RiC Conceptual Model. Slightly Adapted from https://github.com/ICA-EGAD/RiC-O 3.2.3.3 Museum and Gallery Metadata Standards In the museum and gallery domain, various metadata standards and conceptual models have significantly contributed to the management, organisation, and accessibility of CH objects and artworks. Notable among these are CDWA, CCO, LIDO, CIDOC-CRM, as well as Linked Art. CDWA[45], developed in the mid-1990s and maintained by the Getty Vocabulary Program, and CCO[46] created by the VRA[47], introduced in the early 2000s, primarily focus on describing art and cultural artefacts, providing a framework for recording essential details like artist, title, medium, date, and provenance. CDWA is a comprehensive set of guidelines for cataloguing and describing various cultural objects, including artworks, architectural elements, material culture items, collections of works, and associated images. While not a data model itself, it offers a conceptual framework for designing data models and databases, as well as for information retrieval. It then evolved into CDWA Lite, an XML schema for data harvesting purposes [@baca_categories_2017 pp. 1-2]. CCO comprises of both rules and examples of the CDWA categories and the VRA Core 4.0 for describing, documenting, and cataloguing cultural works and their visual surrogates[48] [@coburn_cataloging_2010 pp. 17-18]. Both CCO and CDWA are standards that the CIDOC[49] recommends and supports for museum documentation. LIDO[50] is a CIDOC standard introduced in the early 2000s which offers a lightweight XML-based serialisation used for describing museum-related information – as shown in Code Snippet 3.2. It provides a format for the interchange of data about art and CH objects, complementing CDWA and CCO as it integrates and extends CDWA Lite with elements of CIDOC-CRM [@stein_using_2019 p. 1025]. Ultimately, LIDO's goal is to enhance interoperability, accessibility, and the sharing of collection information, enabling institutions to connect and showcase their collections in diverse contexts [@coburn_lido_2010 p. 3]. LIDO is also a CIDOC Working Group, which are created to tackle particular issues or areas of interest[51]. Code 3.2: Example of a LIDO Object in XML from @lindenthal_lido_2023 <lido:lido> <lido:lidoRecID lido:source="ld.zdb-services.de/resource/organisations/DE-Mb112" lido:type="http://terminology.lido-schema.org/lido00099"> ld.zdb-services.de/resource/organisations/DE-Mb112/lido/obj/00076417 </lido:lidoRecID> <lido:descriptiveMetadata xml:lang="en"> <lido:objectClassificationWrap> <lido:objectWorkTypeWrap> <lido:objectWorkType> <skos:Concept rdf:about="http://vocab.getty.edu/aat/300033799"> <skos:prefLabel xml:lang="en"> oil paintings (visual works) </skos:prefLabel> </skos:Concept> </lido:objectWorkType> </lido:objectWorkTypeWrap> </lido:objectClassificationWrap> <lido:objectIdentificationWrap> <lido:titleWrap> <lido:titleSet> <lido:appellationValue lido:pref="http://terminology.lido-schema.org/lido00169" xml:lang="en"> Mona Lisa </lido:appellationValue> </lido:titleSet> </lido:titleWrap> </lido:objectIdentificationWrap> </lido:descriptiveMetadata> </lido:lido> CIDOC-CRM[52], developed since 1996 by the CIDOC and more specifically maintained by the CRM-SIG — which convenes quarterly[53], is a formal and top-level ontology that offers a comprehensive conceptual framework for describing CH resources, allowing for a deep understanding of relationships between different entities, events, and concepts for museums [@doerr_cidoc_2003 pp. 75-76]. It aims to provide a common semantic framework for information integration, supporting robust knowledge representation and fostering collaboration and interoperability within the CH sector as it can also mediate different resources from libraries and archives. The latest stable version of the conceptual model is version 7.1.2[54], published in June 2022, and comprises of 81 classes and 160 properties[55] [see @bekiari_cidoc_2021]. Within the base ontology of CIDOC-CRM – or CRMBase – and despite the emergence of new developments and gradual changes, there is a fundamental and stable core that can be succinctly outlined. This fundamental structure acts as a basic orientation for understanding the way in which data is structured within CIDOC-CRM. Examining the hierarchical structure of CIDOC-CRM, one can identify the main top-level branches, namely: E18 Physical Thing: This class comprises all persistent physical items with a relatively stable form, human-made or natural. E28 Conceptual Object: This class comprises non-material products of our minds and other human produced data that have become objects of a discourse about their identity, circumstances of creation or historical implication. The production of such information may have been supported by the use of technical devices such as cameras or computers. E39 Actor: This class comprises people, either individually or in groups, who have the potential to perform intentional actions of kinds for which someone may be held responsible. E53 Place: This class comprises extents in the natural space we live in, in particular on the surface of the Earth, in the pure sense of physics: independent from temporal phenomena and matter. They may serve describing the physical location of things or phenomena or other areas of interest. E2 Temporal Entity: This class comprises all phenomena, such as the instances of E4 Periods and E5 Events, which happen over a limited extent in time. Complemented by entities tailored for the documentation of E41 Appellation and E55 Type, the structure – as shown in Figure 3.6 – provides a potent set of means to capture a broad range of general-level CH reasoning in a holistic manner [@bruseker_cultural_2017 pp. 111-112]. Figure 3.6: CIDOC-CRM Top-Level Categories by @bruseker_cultural_2017 [p. 112] CRMBase, is supplemented by a series of extensions – sometimes referred to as the CIDOC-CRM family of models – intended to support various types of specialised research questions and documentation, such as bibliographic records or geographical data. These compatible models[56], ordered alphabetically, include both works in progress and models to be reviewed by CRM-SIG[57]. They comprise as follows: CRMact[58]: An extension that defines classes and properties for integrating documentation records about plans for future activities and future events. CRMarchaeo[59]: An extension of CIDOC-CRM created to support the archaeological excavation process and all the various entities and activities related to it. CRMba[60]: An ontology for documenting archaeological buildings. Its primary purpose is to facilitate the recording of evidence and material changes in archaeological structures. CRMdig[61]: An ontology to encode metadata about the steps and methods of production (‘provenance’) of digitisation products and synthetic digital representations such as 2D, 3D or even animated models created by various technologies. CRMgeo[62]: An ontology intended to be used as a global schema for integrating spatio-temporal properties of temporal entities and persistent items. Its primary purpose is to provide a schema consistent with the CIDOC-CRM to integrate geoinformation using the conceptualisations, formal definitions, encoding standards and topological relations. CRMinf[63]: An extension of CIDOC-CRM that facilitates argumentation and inference in descriptive and historical fields. It serves as a universal schema for merging metadata related to argumentation and inference, primarily focusing on these disciplines. CRMsci[64]: The Scientific Observation Model is an ontology that extends CIDOC-CRM for scientific observation, distinguishing the process from results and providing a formal ontology for scientific data integration and research modelling. CRMsoc[65]: An ontology for integrating data about social phenomena and constructs that are of interest in the humanities and social sciences based on analysis of documentary evidence. CRMtex[66]: An extension of CIDOC-CRM created to support the study of ancient documents by identifying relevant textual entities and by modelling the scientific process related with the investigation of ancient texts and their features. FRBRoo[67]: An ontology intended to capture and represent the underling semantics of bibliographic information which interprets the conceptualisations of the FRBR framework. PRESSoo[68]: An ontology intended to capture and reresent the underling semantics of bibliographic information about continuing resources, and more specifically about periodicals (journals, newspapers, magazines, etc.). PRESSoo is also an extension of FRBRoo. Figure 3.7 shows CRMbase and eight of the extensions previously outlined in a pyramid shape, where the lower you go in the pyramid, the more specialised the concepts. Figure 3.7: CIDOC-CRM Family of Models. Diagram done and provided by Maria Theodoridou (Institute of Computer Science, FORTH) Linked Art[69], a recent addition to this landscape, is a community-driven initiative and a metadata application profile that has been in existence since the end of 2016 [@raemy_ameliorer_2022 pp. 136-137]. This community – recognised as a CIDOC Working Group – has created a common Linked Data model based on CIDOC-CRM for describing artworks, their relationships, and the activities around them (see 3.5.5). 3.2.3.4 Cross-domain Metadata Standards There are a few cross-domain standards that have been used to describe CH resources. For instance, the Dublin Core Elements, containing the original core sets of fifteen basic elements, and Dublin Core Metadata Terms[70], its extension, are widely used metadata standards for describing CH resources. It provides metadata properties and classes that are applicable to a wide range of resources [@weibel_dublin_2000]. Another good example is the EDM that has been specified so that national, regional and thematic aggregators in Europe can deliver resources of content providers to Europeana [see @charles_enhancing_2015; @freire_technical_2019]. Despite the presence of cross-domain standards and efforts to map between standards, whether from one version to another or across different domains, reconciling metadata from various sources remains a significant challenge in the CH sector. Institutions may collect metadata in different ways, using different standards and schemas, making it difficult to merge and compare metadata from different sources. Additionally, metadata may be incomplete, inconsistent, or contain errors, further complicating data reconciliation. To address these challenges, standardised, interoperable metadata are necessary to enable data sharing and reuse. While the use of different metadata standards can present challenges for data reconciliation, the adoption of standardised, interoperable metadata can facilitate data sharing and reuse, promoting the long-term preservation and accessibility of CH resources. Controlled vocabularies – included in what @zeng_metadata_2022 [pp. 24-25] called ‘standards for data value’ – such as those maintained by the Getty Research Institute[71]: the AAT, the TGN, and the ULAN, as well as various kinds of KOS (see 3.2.4). These vocabularies provide a common language for describing CH objects and can improve the interoperability of metadata across different institutions and communities. Alongside metadata reconciliation comes also the question of aggregation. Apart from LIDO in museums, the general and current operating model for aggregating CH (meta)data is still the OAI-PMH [see @raemy_enabling_2020], which is an XML-based standard that was initially specified in 1999 and updated in 2002 [@lagoze_open_2002]. Alas, OAI-PMH does not align to contemporary needs [@van_de_sompel_reminiscing_2015], and there are now some alternative and web-based technologies for harvesting resources that are slowly being leveraged such as AS [@snell_activity_2017], a W3C syntax and vocabulary for representing activities and events in social media and other web application. It can also be easily extended and used in different contexts, such as it is the case with the IIIF Change Discovery API (see 3.5.3.3) or with ActivityPub [@lemmer-webber_activitypub_2018], a decentralised W3C protocol being leveraged by Mastodon[72], a federated and open-source social network. Overall, the evolution of metadata standards in the CH domain paves the way for a more interconnected and accessible digital environment, thereby providing better access to disparate collections and facilitating cross-domain reconciliation. This transformation is complemented by a growing emphasis on web-based metadata aggregation technologies that are more suited to today’s needs. 3.2.4 Knowledge Organisation Systems KOS, also known as concept systems or concept schemes, encompass a wide range of instruments in the area of knowledge organisation. They are distinguished by their specific structures and functions [@mazzocchi_knowledge_2018 p. 54]. KOS include authority files, classification schemes, thesauri, topic maps, ontologies, and other related structures. Despite their differences in nature, scope and application, all share a common goal: to facilitate the structured organisation of knowledge and classification of information. According to @zeng_metadata_2022 [p. 284], ‘KOS have a more important function: to model the underlying semantic structure of a domain and to provide semantics, navigation, and translation through labels, definitions, typing relationships, and properties for concepts’. This overarching intent underpins the practice of information management and retrieval. The term KOS ‘became even more popular after the encoding standard Simple Knowledge Organization System (SKOS) was recommended by W3C’, although the use of such systems can be traced back over 100 years, whereas others have been created in the advent of the web [@zeng_metadata_2022 p. 188]. According to @hill_integration_2002 [pp. 46-47, citing [@hodge_systems_2000]], KOS can be divided into four main groups: term lists, metadata-like models, classification and categorisation, as well as relationship models. Term lists encompass authority files, dictionaries, and glossaries, serving as controlled sources for managing terms, definitions, and variant names within a knowledge organisation framework. Metadata-like models encompass directories and gazetteers, offering lists of names and associated contact information as well as geospatial dictionaries for named places, with can be extended for representing events and time periods. In the classification and categorisation domain, you find categorisation schemes and classification schemes that organise content, subject headings that represent controlled terms for collection items, and taxonomies that group items based on specific characteristics. Finally, relationship models feature ontologies, semantic networks, and thesauri, each capturing complex relationships between concepts and terms [@hill_integration_2002; @zeng_knowledge_2008]. Figure 3.8 represents an overview of the structure and functions of these four main groups, showcasing as well the subcategories of KOS previously mentioned. In this figure, the x characters indicate the extent to which each type of KOS embodies five key functions identified by @zeng_knowledge_2008, such as eliminating ambiguity or controlling synonyms. In this subsection, I will explore four subcategories of KOS, each representing a continuum from a more linear to a more structured network. These include folksonomy, taxonomy, thesaurus, and ontology. These KOS have been selected due to their significant impact on the organisation and interlinking of data within the contexts of CHI practices and LOD. Furthermore, the intent of these systems is to help bridge the gap between human understanding and machine processing. Figure 3.8: Overview of the Structures and Functions of KOS [@zeng_knowledge_2008 p. 161] 3.2.4.1 Folksonomy Positioned at one end of the organisational spectrum, folksonomies, also known as community tagging or social bookmarking, are characterised by their user-generated nature. These systems rely on individual users’ tagging of content with keywords or tags that reflect their personal perspectives and preferences. Folksonomies as integration or reconciliation is often hard to achieve [@zeng_metadata_2022 p. 401]. However, they do provide a wealth of source material for studying social semantics [@zeng_metadata_2022 p. 403] and can be done in parallel to more structured KOS. 3.2.4.2 Taxonomy Moving towards the centre of the spectrum, taxonomies present a more structured approach to knowledge organisation. [@zeng_knowledge_2008 p. 169]. Taxonomies employ hierarchical classifications to systematically categorise information into distinct classes and sub-classes, or in a parent/child relationship [@saa_dictionary_taxonomy_2023] - as shown by Code Snippet 3.3 [@niso_guidelines_2010 p. 18]. Taxonomy, in this context, extends beyond mere categorisation; it also establishes relationships. Code Snippet 3.3: Taxonomy Hierarchy Chemistry Physical Chemistry Electrochemistry Magnetohydrodynamics 3.2.4.3 Thesaurus Moving further along the spectrum, thesauri offer a more detailed and formalised method of organisation. They include not only hierarchical relationships but also explicit semantic connections between terms, making them valuable tools for information retrieval. As defined by @niso_guidelines_2010 [p. 9]: A thesaurus is a controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators. For instance, consider a thesaurus related to photography, which encompasses categories for various aspects of photography, including photographic techniques, equipment, and materials. Within this taxonomy, ‘Kodachrome’ could be categorised not only as a specific type of colour film but also as a distinct photographic process. As a type, it could fall under the sub-category of ‘colour film photography’, and as a process, it would fit within the broader framework of ‘photographic techniques’. The AAT, commonly employed in the CH domain, stands as a significant example of a thesaurus [@harpring_development_2010 p. 67]. Homosaurus[73] is another example of a thesaurus with a distinct focus on enhancing the accessibility and discoverability of LGBTQ+ resources and related information. Leveraging Homosaurus in metadata can effectively contribute to diminishing biases present in such data, an essential step in promoting inclusivity and equity within information systems [see @hardesty_mitigating_2021]. 3.2.4.4 Ontology At the structured end of the spectrum, ontologies define complex relationships and attributes between concepts, whereby a series of concepts have been chosen to express what we understand, so that a computer can start making sense of our world. Ontologies are formalised KOS, enabling advanced data integration and KR for more sophisticated applications. The term is drawn from philosophy, where an ontology is a discipline concerned with studying the nature of existence, as articulated by @gruber_translation_1993 [pp. 199-200]: An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an ontology is a systematic account of Existence. For knowledge-based systems, what “exists” is exactly that which can be represented. There are different kinds of ontologies, including axiomatic formal ontologies, foundational ontologies, and domain-specific ontologies [@beretta_interoperabilite_2022]. These different types of ontologies cater to various knowledge representation needs. Foundational ontologies, such as DOLCE [74], provide a high-level framework for modelling knowledge and offers a comprehensive system for representing entities, qualities, and relationships [see @masolo_wonder_2003; @borgo_dolce_2022]. DLs, a family of formal KR languages, play also a key role in developing ontologies and serve as the foundation for OWL (see 3.4.2), notably by providing a logical formalism. DLs are characterised by their ability to provide substantial expressive power that goes well beyond propositional logic, while maintaining decidable reasoning [@chang_abox_2014]. In computer science, the concepts of ABox and TBox, both statements in KBs, are relevant to the structuring and enrichment of KGs [@giacomo_tbox_1996][75]. The ABox, representing the ‘assertion’ or ‘instance’ level, encapsulates concrete data instances and their relationships, contributing to the factual knowledge of a given system. Conversely, the TBox, representing the ‘terminology’ or ‘schema’ level, defines the conceptual framework and hierarchies that govern the relationships and attributes of the instances. These two complementary components work in harmony to improve data interoperability, reasoning and knowledge sharing. Figure 3.9 depicts a high-level overview of a KB representation system. Figure 3.9: Knowledge Base Representation System Based on @patron_embedded_2011 [p. 205] Consider a scenario around artwork provenance held in a museum. The ABox strives to encapsulate the rich narratives of individual artworks, tracing their journey through time, ownership transitions and exhibition travels. At the same time, the TBox creates a conceptual scaffolding, imbued with classes such as Artwork, Creator, and Exhibition, painting an abstract portrait that contextualises each artefact within a broader cultural tapestry. It is here that the DL comes in, harmonising the symphony with its logical relationships and axioms, i.e. a rule or principle widely accepted as obviously true [@baader_13_2007]. The DL is represented as 𝒦 = (𝒯, ℛ, 𝒜), where: 𝒯: represents the TBox, defining the conceptual framework, which encompasses the hierarchical relationships, classes, and concepts within the KB. ℛ: represents the set of binary roles, delineating the relationships and connections between individuals or instances in the domain. These roles facilitate the understanding of how entities relate to one another within the KB. 𝒜: represents the ABox, encompassing the specific assertions or instances in the KB. This symbiotic interplay ensures that the provenance of each artwork is not just a static account, but a dynamic, interconnected narrative. The ABox-TBox relationship thrives in the realm of reasoning. Imagine an axiom embedded in the TBox: ‘A work of art presented in an exhibition curated by a distinguished patron is of heightened cultural significance’, or here phrased in DL terms: ∃ curates.Artwork.CulturalSignificance ⊑ true. This axiom serves as a beacon to guide the system’s reasoning. When an ABox instance of an artwork is woven into an exhibition curated by a prominent authority, the DL-informed engine responds by inferring an enriched cultural value that resonates beyond the artefact itself. This is where the TBox takes data and gives it life, producing insights that transcend the boundaries of individual instances. The KB, 𝒦, captures this orchestration, encapsulating the logical relationships for meaningful interpretation and knowledge discovery. Overall, the relationship between ABox and TBox in DL is vital for achieving semantic clarity, enabling meaningful data integration, and facilitating advanced reasoning mechanisms. The museum provenance scenario showcases a precisely orchestrated convergence of assertion, terminology, and rigorous logical reasoning. This engenders a computational landscape where historical artefacts intricately mesh within the complex network of human history’s data structures, seamlessly aligning with the underlying framework of algorithmic representation. These components enable software developers to harmonise disparate datasets, extract insightful knowledge, and support decision-making processes across a wide range of domains. In essence, the use of DL, ABox, and TBox in ontological KR enhances interoperability between different systems and allows for sophisticated reasoning and decision support. Moving beyond these foundational concepts, it is noteworthy to consider the work of @ehrlinger_towards_2016, who address the need for a clear and standardised definition of KGs. They highlight the term’s varied interpretations since its popularisation by Google in 2012 and propose a definitive, unambiguous definition to foster a common understanding and wider adoption in both academic and commercial realms. They define a KG as follows: ‘A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge’. This definition crystallises the essence of KGs as dynamic and integrative systems that not only store but also process and enrich data through advanced reasoning. This conceptualisation underlines the transformative potential of KGs in various domains, bridging the gap between raw data and actionable insights. Finally, it is important to recognise that the importance of ontologies extends beyond individual systems. Shared ontologies are a cornerstone of semantic interoperability, thus facilitating a paradigm shift in the way systems and applications communicate. As @sanderson_rdf_2013 argues: ’shared ontologies increases semantic interoperability’ and ‘shared identity makes it possible for graph to merge serendipitously’. This shared understanding ensures that various entities can seamlessly connect and engage in meaningful interactions. 3.3 Trends, Movements, and Principles Technological trends, scientific movements, and guiding principles have played a crucial role in shaping the landscape of contemporary research. In recent years, there has been an increased emphasis on the need for academic and CH practices to be more transparent, inclusive, and accountable. This shift reflects a broader trend towards integrating advanced technological solutions and open-science principles in heritage management. As such, understanding the evolution of CH becomes imperative to comprehend how these practices have adapted and transformed in response to these guiding trends. The evolution of CH has been characterised by a series of technological and methodological shifts. Initially, the primary focus was on digitising physical artefacts to preserve information from degrading originals. This phase was crucial for transitioning tangible CH into a digital format, mitigating the risk of loss due to physical degradation. Following this, efforts shifted towards ensuring the persistence of digitised resources. This stage involved addressing challenges related to digital preservation, including data degradation and format obsolescence, to ensure the longevity of digital cultural assets. The advent of open data principles marked the next phase in CH development. This approach facilitated broader access to information, aligning with contemporary values of transparency and inclusivity in, governmental, academic, and cultural contexts. Subsequently, the focus expanded to enhancing the utility of this data. This stage involved contextualising and enriching CH data, thereby increasing their applicability and relevance across various domains. The current frontier in CH involves developing applications that leverage rich CH data. These applications serve not only as tools for engagement and education but also as justifications for the ongoing costs associated with data storage and archival. They illustrate the tangible benefits derived from preserving heritage resources, encompassing both cultural and economic returns. In summary, the trajectory of CH development mirrors broader technological and societal trends, transitioning from preservation to active utilisation. This progression underscores the dynamic nature of research and CH processes, highlighting the evolving requirements for transparency, inclusivity, and accountability in CH management. While automation has significantly enhanced the efficiency of digitisation processes in CH, cataloguing and indexing remain complex challenges. The intricacies involved in accurately understanding and categorising resources necessitate more than just technological solutions; they require context-aware and culturally sensitive approaches. Here, ML offers promising perspectives. ML, particularly in its advanced forms like deep learning, can assist in cataloguing and indexing by analysing large datasets to identify patterns, categorise content, and even suggest metadata. This can be particularly useful in handling large volumes of CH data, where manual processing is time-consuming and prone to human error. Typical applications of ML in this field include image recognition for identifying and classifying visual elements in artefacts, NLP for analysing textual content, and pattern recognition for sorting and organising data based on specific characteristics. Furthermore, prospective developments may entail the refinement of metadata mapping and the enhancement of quality control mechanisms. Moreover, ML algorithms can be trained to recognise stylistic elements, historical contexts, and other nuances that are essential for accurate cataloguing in CH. However, it is crucial to note that the effectiveness of ML depends heavily on the quality and diversity of the training data. Biases in this data can lead to inaccuracies in cataloguing and indexing. Thus, a collaborative approach, where ML is supplemented by expert human oversight, is often the most effective strategy. Overall, this section provides a comprehensive overview of six three[26:1] technological trends as well as five key scientific movements and guiding principles that are shaping research and how universities and GLAMs should provide environments, services, and tools with a view to collecting and disseminating content. By exploring each of these trends, movements, and principles, we can gain a deeper understanding of how research and CH processes are permeated by dynamic movements and how resources can be made more transparent, inclusive and accountable, as well as how data can be made available to human and non-human users. 3.3.1 Current and Emerging Technological Trends in Cultural Heritage I will explore some current and emerging technological trends in CH, organised into three components: Linked Data, big data, and AI. Each represents a critical driver shaping the landscape and practices of heritage data. The three trends have been around for a few decades, with the ‘Linked Data’ principles and underlying standards coming from the late 1990s, ‘big data’ being coined in 1990 and AI in 1956. Before considering the trends discussed hereafter, note that current technological developments do not exist in isolation, but tend to intertwine and act synergistically. A vivid example of this interplay can be seen in AI and its latent impact on the semantic web, particularly in facilitating more efficient querying and crawling processes such as the LinkedDataGPT proof-of-concept service[76] from Liip on the City of Zurich that combines ChatGPT — a generative AI solution — on top of a Linked Data portal to facilitate querying open datasets [@stocker_use_2023]. Inversely AI can be fed by data on the web to learn and reason, as outlined by @gandonWebScienceArtificial2019. 3.3.1.1 Linked Data Linked Data, and most precisely LOD, is a set of design principles adhering to RDF which is a significant approach to interconnect data on the web in order to make semantic queries more useful [@berners-lee_semantic_2001]. In other words, this standardisation allows data to be not only linked, but also openly accessible and reusable. As noted by @gandonWebScienceArtificial2019 [p. 115, citing [@gandon_pour_2017]]: The Web was initially perceived and used as a globally distributed hypertext space for humans. But from its inception, the Web has always been more: its hypermedia architecture is in fact linking programs world-wide through remote procedure calls. This deeper understanding of the web’s architecture as a conduit for linking programs on a global scale holds profound implications. It signifies that the web is not merely a medium for accessing information but a dynamic environment where data-driven programs interact, exchange data, and collaborate across geographical boundaries. In this context, Linked Data emerges as a powerful enabler, providing a structured and standardised approach for these programs to communicate and share meaningful data [@bizer_linked_2008]. In the context of CH, institutions such as museums, libraries and archives can publish their collections using Linked Data principles, enabling a web of linked information that is accessible to all. As this dissertation’s main topic revolves around Linked (Open) (Usable) Data, two dedicated sections have been written within this literature review in Section 3.4 and Section 3.5. Beyond formal LOD, CHIs may also link their databases or collections in more informal ways. This interconnection may take the form of shared metadata, common identifiers, or simply hyperlinks. These links can enhance the user experience by supporting a more seamless navigation between related items or pieces of information. For instance, a parallel strategy is the use of graph-based data representation, i.e. property graph which consists of a set of objects or vertices, and a set of arrows or edges connecting the objects, that are most likely not RDF-compliant [see @bermes_modelisons_2023]. Graph databases, such as Neo4j[77] which is quite prevalent in DH [see @webber_programmatic_2012; @drakopoulos_semantically_2019; @darmont_data_2020], allow for efficient storage and retrieval of interconnected data through nodes representing entities and relationships linking them. 3.3.1.2 Big Data Big Data refers to extremely large and complex datasets that exceed the capabilities of traditional data processing methods and tools. It encompasses a massive volume of structured, semi-structured and unstructured data that is currently flooding across a variety of sectors, companies and organisations [see @emmanuel_defining_2016]. The characteristics of big data are often described by the three ⋁ model [@laney_3d_2001]: Volume: Big data refers to a massive amount of data. This can encompass a spectrum of data sizes, extending from GB and TB, to PB[78] and beyond. The sheer size of the data is a key aspect of big data, making traditional database systems inadequate for storage and analysis. Velocity: Data is being generated and collected at an unprecedented rate. Social media posts, sensor data, online transactions and more are constantly being generated, requiring real-time or near real-time processing and analysis. Variety: Big data comes in a variety of formats, including structured data (e.g. databases), semi-structured data (e.g. XML, JSON) and unstructured data (e.g. text, images, video). The variety of data types requires flexible processing methods. In addition to the three ⋁ model, two more characteristics are often included [@saha_data_2014 p. 1294]: Veracity: It refers to the quality of the data, including its accuracy, reliability and trustworthiness. Big data sources can be inherently uncertain or inaccurate, and addressing data quality is a critical challenge. Value: Extracting value and actionable insights from big data is the ultimate goal. Analysing and interpreting Big data should lead to better decision-making, improved business strategies, as well as enhanced UX[79]. Regarding the two latter dimensions, @debattista_linked_2015 argue that that Linked Data is the most suitable technology to increase the value of data over conventional formats, thus contributing towards the value challenge in Big Data. As for veracity, they describe a semantic pipeline with eight key metrics to address the veracity dimension. Building on this technological foundation, the integration of Linked Data and Big Data analytics takes centre stage. Big data analytics can be employed on CH content to uncover insights and correlations that can be used in decision-making. @barrile_big_2022 [p. 2708] highlight the transformative potential of using big data by investigating how analytical approach can enhance conservation strategies, aid resource allocation and optimise the management of CH resources. @poulopoulos_digital_2022 [pp. 188-189] emphasise that emerging technology trends, including big data, have a significant impact on related research areas such as CH. Big data primarily originates from sources such as social media, online gaming, data lakes[80], logs and frameworks that generate or use significant amounts of data. They stress that the incorporation of multi-faceted analytics in the CH domain is an area of active research, and present a data lake that provides essential user and data/knowledge management functionalities. However, they emphasise a crucial consideration - the need to bridge the theoretical foundations of disciplines such as cultural sociology with the technological advances of big data. 3.3.1.3 Artificial Intelligence AI has been coined for the first time by John McCarthy, an American computer scientist and cognitive scientist, during the 1956 Dartmouth Conference, which is often considered the birth of AI as an academic field [@andresen_john_2002 p. 84]. According to the @oxford_english_dictionary_artificial_2023, AI is described as follows: The capacity of computers or other machines to exhibit or simulate intelligent behaviour; the field of study concerned with this. In later use also: software used to perform tasks or produce output previously thought to require human intelligence, esp. by using machine learning to extrapolate from large collections of data. While AI is not the central focus of my PhD thesis, I acknowledge its impact in several instances. As a rapidly developing technology, AI has the potential to significantly transform various aspects of society, including the way we describe, analyse, and disseminate CH resources. It is worth mentioning that I endeavour to engage in a broader discourse concerning the domain of AI. In this context, I use the acronyms AI to talk about the overarching domain or its ethics, and ML to discuss the specifics of methodologies and algorithmic approaches, while refraining from delving into the intricacies of Deep Learning, which is a distinct subdomain within ML. AI and ML offer great potential for digitising, curating and analysing CH, leveraging the vast digital datasets from CHIs. Some of the examples include text recognition mechanisms using OCR and HTR, NLP and NER for enriching unstructured text, as well as object detection methods for finding patterns within still and moving images [@neudecker_cultural_2022; @sporleder_natural_2010]. Textual works can also be analysed, for instance for sentiment analysis [see @susnjak_applying_2023], and generated using LLM – a variety of NLP, such as BERT or ChatGPT, which predicts the likelihood of a word given the previous words present in recorded texts. However, challenges such as data quality and biases in AI persist [@neudecker_cultural_2022]. In addition, there are still uncertainties regarding the licensing and reuse of CH datasets by ML algorithms[81]. @neudecker_cultural_2022 emphasises the importance of well-curated digitised CH resources that are openly licensed, accompanied by relevant metadata, and accessible through APIs or download dumps in various formats. These curated resources have the potential to address the existing gap in this domain. Building on the theme of enhancing CH through digital technologies, @mcgillivray_digital_2020 explore the synergies and challenges found at the intersection of DH and NLP. DH is aptly described as ‘a nexus of fields within which scholars use computing technologies to investigate the kinds of questions that are traditional to the humanities […] or who ask traditional kinds of humanities-oriented questions about computing technologies’ [@fitzpatrick_reporting_2010]. This broad characterisation encapsulates the transformative potential of digital tools, including ML techniques, in enriching humanities research. @mcgillivray_digital_2020 highlight the critical need for bridging the communication gap between DH and NLP to drive progress in both fields. They propose increased interdisciplinary collaboration, encouraging DH researchers to actively utilise NLP tools to refine their research methodologies. A primary challenge in this convergence is the application of NLP to the complex, historical, or noisy texts often encountered in DH research. They conclude by advocating for stronger cooperation between practitioners in these fields. This collaborative effort is vital for harnessing the full potential of ML in analysing and interpreting CH. The use of ML scripts in the context of CH — and beyond — is inherently limited by their applicability, namely when dealing with historical photographs. In such cases, the use of algorithms that are mostly trained and grounded in contemporary image data becomes quite incongruous due to the dissimilarity in temporal contexts. This dilemma is exemplified by datasets such as Microsoft’s Common Object in Context (COCO)[82] [@fleet_microsoft_2014], where the available data are predominantly contemporary photographic content, which is misaligned with the historical nuances inherent in most of the digitised CH images. @coleman_managing_2020 corroborates that a sound approach would be for ML practitioners to collaborate with libraries as they can draw practical lessons from critical data studies and the thoughtful integration of AI into their collections, using guidelines from DH. She also advocates that as handing handing over datasets would be a disservice to library patrons and that ‘Librarians need to master the instruments of AI and employ them both to learn more about their own resources—to see and analyze them in new ways—and to help shape applications of AI with the expertise and ethos of libraries.’ Ethical concerns, particularly regarding social biases and racism, are prevalent in technologies like ImageNet, where facial recognition may yield AI statements with strong negative connotations [@neudecker_cultural_2022]. Addressing this, @gandonWebScienceArtificial2019 suggest the production of AI services that are ‘benevolent-by-design for the good of the Web and society’. Furthermore, @floridi_good_2023 introduces the double-charge thesis, asserting that all technology design is a moral act, challenging the neutrality thesis. He emphasises that technologies are not neutral and can be influenced by a dynamic equilibrium of values, predisposing them towards morally good or evil directions. As mentioned previously, the ML training datasets are often not enough representative to be properly leveraged in the CH sector [@strien_introduction_2022]. Fine-tuning is now a topic though and new ground truth datasets have been created and tailored for the needs of CH, such as Viscounth[83], a large-scale VQA dataset — i.e a dataset containing open-ended questions about images which requires an understanding of vision, language and commonsense knowledge to answer [@goyal_making_2017] — for CH in English and Italian [see @becattini_viscounth_2023]. @jaillant_unlocking_2022 argue that the governance of AI ought to be carried out in partnership with GLAM institutions. However, while this collaboration has been proposed as a promising way forward, it still requires further exploration and evaluation, particularly with regards to the specific challenges and opportunities that it presents. On the one hand, the involvement of GLAMs in AI governance could enhance the development of digital CH projects that promote social justice and equity. However, on the other hand, this collaboration raises several challenges, such as the need to address issues of privacy, data protection, and intellectual property rights, and to ensure that the values and perspectives of GLAM professionals are adequately represented in the development of AI algorithms and systems. Therefore, it is crucial to examine the specific challenges and opportunities of this collaboration and to develop appropriate frameworks and guidelines that enable effective and ethical governance of AI in the GLAM sector. One of these platforms that address these issues is AI4LAM, which is an international and participatory community focused on advancing the use of AI in, for and by libraries, archives, and museums[84]. The initiative was launched by the National Library of Norway and Stanford University Libraries in 2018 inspired by the success of the IIIF community. Another agency is the AEOLIAN Network[85], AI for Cultural Organisations, which investigates the role that AI can play to make born-digital and digitised cultural records more accessible to users [@jaillant_applying_2023 p. 582]. As an illustrative case, the LoC's exploration into ML technologies, as highlighted by @allen_why_2023, demonstrates a strategic commitment to enhancing the accessibility and utility of its diverse collections. This initiative reflects the LoC's acknowledgement of the transformative potential of ML, balanced with a cautious approach due to the necessity for accurate and responsible information stewardship. The LoC faces several challenges in applying ML, particularly the limitations of commercial AI systems in handling its varied materials and the requirement for substantial human intervention. This cautious exploration into ML is indicative of a broader trend in CHIs, where maintaining a balance between embracing technological advancements and preserving authenticity and integrity is crucial. The specific experiments and projects undertaken by the LoC in the realm of ML are diverse and illustrative of the institution’s comprehensive approach to innovation. For instance, image recognition systems have been tested for identifying and classifying visual elements in artefacts, a task that requires a nuanced understanding of historical and cultural contexts. In another initiative, speech-to-text technology was employed to transcribe spoken word collections, confronting challenges such as accent recognition and audio quality variation. Additionally, the LoC explored the potential of ML in enhancing search and discovery capabilities through projects like Newspaper Navigator[86], which aimed to identify and extract images from digitised newspaper pages. These experiments not only highlight the potential of ML in transforming the way LoC manages and disseminates its collections but also reveal the complexities and limitations inherent in these technologies. As @allen_why_2023 notes, the ongoing research and experimentation in ML at the LoC are critical in revolutionising access and discovery in the cultural heritage sector. These efforts, while facing challenges, represent a diligent integration of advanced technologies, upholding principles of responsible custodianship and setting a precedent for similar institutions globally in the adoption and adaption of ML and AI in CHIs. The integration of LLM and KG presents a groundbreaking opportunity, particularly within the realm of CHIs, where there is already considerable expertise. This is aptly demonstrated in the work of @pan_large_2023, which elucidates the harmonisation between explicit knowledge and parametric knowledge, i.e. knowledge derived from patterns in data, as learned by models such as LLMs. The authors highlight three key areas for the advancement of KR and processing: Knowledge Extraction, where LLMs improves the extraction of knowledge from diverse sources for applications such as information retrieval and KG construction; Knowledge Graph Construction, which involves LLMs in tasks such as link prediction and triple extraction from data, albeit with challenges in precision and management of long tail entities; Training LLMs Using KGs, where KGs provides structured knowledge for LLMs, helping to build retrieval-augmented models on the fly, enriching LLMs with world knowledge and increasing its adaptability. In a report for the University of Leeds in the UK, @pirgova-morgan_looking_2023 explores the potential and practical implications of AI in libraries. The project, forming part of the university’s ambitious vision for digital transformation, aims to understand how AI can be effectively integrated into library services. This research looks at both the use of general AI for long term strategic planning and specific AI applications for improving UX, process optimisation and enhancing the discoverability of collections. The methodology used in this study involves a multi-faceted approach including desk-based assessments, a university-wide survey and expert interviews. Specifically, the study highlights the following key findings: AI for UX and Process Optimisation: The integration of AI technologies offers substantial opportunities for improving user experiences in libraries. This includes optimising library processes, enhancing collections descriptions, and improving their discoverability. Challenges and Opportunities of AI Application: While AI presents exciting possibilities, its practical application in library settings faces challenges. These include evaluating specific AI technologies in the unique context of the University of Leeds, ensuring they align with the institution’s needs and goals. Perceptions of AI in Libraries: The report reveals varying perceptions among librarians and users regarding AI. This includes views on how AI can contribute to resilience, awareness of climate change, and practices promoting equality, diversity, and inclusion. Role of AI in Strategic Library Development: General AI technologies are seen as instrumental in shaping long-term strategies for libraries, highlighting the need for ongoing adaptation and development in response to evolving AI capabilities. Expert Perspectives on AI in Libraries: Interviews with experts from around the world underscore the importance of understanding both general and specific applications of AI. These insights help in identifying priority areas where AI can significantly enhance library operations and services. These insights from the University of Leeds report illustrate the complex impact of AI on library services, from enhancing user interaction to influencing strategic decision-making, while also emphasising the importance of adapting AI applications to specific institutional needs. It must be also stated that AI lacks inherent intelligence and consciousness, and have been ultimately built by people. An important concern, namely with LLM, is the perceptual illusion of cognitive interaction, where the machine appears to be engaging in dialogue and reasoning, when in fact it is generating content through predictive algorithms [see @ridge_enriching_2023]. Furthermore, regarding the topic of data colonialism, poor people in underprivileged nations are often burdened with the responsibility of cleaning up the toxic repercussions of AI, shielding affluent individuals and prosperous countries from direct exposure to its harmful effects[87]. Concluding this segment, it is essential to perceive ML algorithms as uncertain ‘socio-material configurations’, which can be seen as both powerful and inscrutable, demanding an axiomatic and problem-oriented approach in their understanding and application. @jaton_we_2017 elaborates on this by examining how these algorithms, while technologically complex, are firmly rooted in and shaped by the social, material, and human contexts in which they are developed. Beyond their computational complexity, these algorithms are deeply embedded in the process of constructing . These ground truths are not inherent or fixed; instead, they emerge from collaborative efforts that reflect the varied inputs of actors. This process underscores the algorithms as socio-material constructs, influenced by the characteristics and contexts of their creators. Understanding algorithms in this light highlights their deep integration with human actions and societal norms, offering a more nuanced view of their design and implementation [see @jaton_assessing_2021; @jaton_groundwork_2023]. 3.3.2 Scientific Movements and Guiding Principles First, 3.3.2.1 examines the movement towards more open and transparent forms of research. Open scholarship is a broad concept that encompasses practices such as open access publishing, open data, open source software, and open educational resources. The subsection explores the benefits and challenges of open scholarship, and how it can help to increase the accessibility and impact of research data. Then, 3.3.2.2 explores the growing trend of involving members of the public in scientific research. Citizen science and citizen humanities involve collaborations between scientists and non-expert individuals, with the aim of generating new knowledge or solving complex problems. The subsubsection examines the benefits and challenges of citizen science and citizen humanities, and how they can help to democratise research. 3.3.2.3 examines the set of guiding principles designed to ensure that research outputs are FAIR. It explores the importance of each data principle for research integrity, reproducibility, and collaboration, and provides examples of how they can be implemented in practice. 3.3.2.4 explores the importance of ethical and culturally sensitive data governance practices for indigenous communities that are materialised through CARE. These principles provide a framework for managing data in a way that is consistent with the values and cultural traditions of indigenous communities. This part explores as well the challenges and opportunities of implementing the CARE Principles for Indigenous Data Governance. Finally, 3.3.2.5 explores the concept of ‘Collections as Data’, a perspective that has emerged from the practical need and desire to improve decades of digital collecting practice. This approach re-conceptualises collections as ordered digital information that is inherently amenable to computational processing. 3.3.2.1 Towards Open Scholarship According to the FOSTER[88], Open Science can be described as ‘[…] the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.’ [@foster_open_2019]. In recent years, the principles of Open Science, that historically include Open methodology, Open source, Open data, OA, Open peer review, as well as open educational resources, have become increasingly important as they emphasise transparency, collaboration and accessibility in scientific research [@bezjak_open_2019]. Open methodology refers to the sharing of research processes and methods, allowing other researchers to reproduce and build on existing work [see @vicente-saez_open_2018]. Open source software and tools enable researchers to collaborate, while open data practices promote the sharing of research data in ways that are accessible, discoverable and reusable by others[89]. Open access seeks to remove financial and other barriers to accessing scientific knowledge, while open peer review provides greater transparency and accountability in the publication process. Finally, open educational resources encourage the sharing of teaching and learning materials, thereby facilitating the dissemination of knowledge and skills. @unesco_preliminary_2019 conducted a preliminary study of the technical, financial and considerations related to the promotion of Open Science. This research underscored the necessity for a holistic approach to Open Science and stressed the significance of tackling international legal matters, as well as the existing challenges stemming from unequal access to justice, which can hinder global scientific collaboration. This study laid the groundwork for a recommendation on making ‘[…] multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation’ [@unesco_implementation_2021 p. 7]. UNESCO identified five types of access related to Open Science: infrastructures, societal actors, as well as associated and diverse knowledge systems where dialogue is needed. This includes acknowledging the rights of indigenous peoples and local communities to govern and make decisions on the custodianship, ownership, and administration of data on traditional knowledge and on their lands and resources. Figure 3.10 provides a visual summary of this. Figure 3.10: Open Science Elements, Redrawn Slide from Presentation of Ana Persic [@morrison_redrawn_2021 citing [@persic_building_2021]] While Open Science offers numerous benefits, it also presents challenges and potential drawbacks that warrant careful consideration. One major concern is the risk of exacerbating inequities between researchers from well-resourced institutions and those from less privileged backgrounds. Open access publishing often entails significant costs in the form of article processing charges, which can disproportionately burden researchers without adequate funding support [@burchardt_researchers_2014]. Additionally, Open Science practices relying on open protocols may be vulnerable to misuse, such as automated bots excessively crawling open repositories or datasets. This can lead to overloading systems, unauthorised data extraction, or unintended uses of research outputs [see @irish_bots_2023; @li_good_2021]. These risks underscore the importance of balancing openness with safeguards that ensure equitable participation and secure, sustainable access to research materials. These challenges are particularly relevant in the context of DH, a field that harnesses the promise and impact of digital technologies and methodologies for the study and understanding of cultural phenomena. The adoption of Open Science principles has contributed to greater collaboration, transparency and accessibility in research practices in this field. Open data practices are particularly relevant, as they allow scholars to work with large and complex datasets, including digitised archives and social media data. Open educational resources can also be used to support the dissemination of CH literacy and skills, enabling wider audiences to engage with such resources. However, ensuring that such openness does not exacerbate inequities or introduce vulnerabilities requires thoughtful implementation. In addition to the principles of Open Science, the concept of Open Scholarship has been introduced by [@tennant_tale_2020] as a broader approach that encompasses the arts and humanities and goes beyond the research community to the wider public. Open Scholarship emphasises the importance of making research and scholarship accessible to a wider audience, including non-experts, educators and policy makers. It can be particularly relevant to the arts and humanities, as they often deal with complex cultural materials and narratives that have wider societal implications. By making their work openly accessible and engaging with non-experts, humanities researchers can contribute to public discourse, promote cultural understanding, and inform policy and decision-making. Open scholarship can also support greater collaboration and innovation within the Arts and Humanities by enabling researchers to work collaboratively across disciplines and with a wide range of constituents. For instance, open educational resources can be used to develop collaborative teaching and learning materials that draw on the expertise of scholars and practitioners from different disciplines, while open data practices can facilitate the sharing and reuse of CH materials. Conversely, @knochelmann_open_2019 advocates for the term Open Humanities as a dedicated discourse that would within the humanities. Notably, he argues that Open Humanities should adapt key Open Science elements to the Humanities’ unique context. In the case of preprints, the challenges in the humanities, such as limited discipline-specific preprint servers and linguistic diversity, require tailored solutions to encourage adoption. Open peer review in the humanities should accommodate the field’s subjectivity and diverse perspectives. Concerns about liberal copyright licenses revolve around potential misrepresentation and plagiarism, highlighting the importance of maintaining scholarly integrity regardless of the chosen license. Knochelmann’s proposal underscores the need for context-sensitive approaches to promote openness and collaboration while respecting humanities’ distinct characteristics. Overall, the principles of Open Science provide a framework for promoting greater collaboration, transparency and accessibility in research practices. Yet, the challenges discussed underscore the need for careful adaptation to address inequities, cybersecurity concerns, and field-specific nuances. The concept of Open Scholarship, which stresses the importance of making research and scholarship accessible to wider audiences, can be instrumental in broadening the impact of research in both natural sciences and the humanities, as Open Science encourages greater collaboration and innovation across disciplines. Ultimately, this underscores the need for adaptation and positions all academic disciplines as essential contributors to societal understanding, cultural preservation and informed decision-making, while ensuring the sustainability and integrity of open practices. 3.3.2.2 Citizen Science, Citizen Humanities Citizen Science and Citizen Humanities are approaches that involve the public in scientific and humanities research, respectively. They have become increasingly popular in recent years as a means of democratising research and engaging the public in academic initiatives. Citizen Science, as articulated by @irwin_citizen_1995, embodies a fundamental commitment to sourcing knowledge beyond the confines of academia, with a deliberate focus on addressing the concerns and interests of the public. This perspective underscores the transformative power of Citizen Science, making it a catalyst for a more democratic approach to scientific endeavours. @bonney_citizen_1996’s perspective complements this vision by framing Citizen Science as a collaborative process where amateur enthusiasts actively participate in data collection for academic science, all the while gaining a deeper understanding of scientific principles and processes. In this light, Citizen Science emerges as an ideal vehicle for science education and a potent tool for enhancing public appreciation of scientific pursuits. These viewpoints loosely align with the Oxford English Dictionary’s definition, characterising Citizen Science as ‘scientific work undertaken by members of the general public, often in collaboration with or under the direction of professional scientists and scientific institutions’ and traces the earliest evidence of the term in 1989 [@oxford_english_dictionary_citizen_2023]. As such, Citizen Science stands as a harmonious intersection of public engagement, education, and scientific inquiry, amplifying the voice of non-academic contributors and democratising the scientific landscape. The public can play a vital role in data collection, analysis, and interpretation. This involvement can take the form of participating in wildlife sightings tracking, monitoring water quality, or assessing air pollution. By participating in these activities, citizens become direct contributors to the generation of valuable scientific data. The transformative power of Citizen Science extends across a wide spectrum of scientific disciplines, emphasising its capacity to democratise and broaden the reach of scientific endeavours [see @vohland_science_2021]. Citizen Science is a form of co-creation, whether viewed as an innovation-oriented means of value creation [@jansma_co-creation_2022] or as a more radical form of empowerment, reinforces the democratisation of the research process [@metz_co-creative_2019]. It amplifies the voice of non-academic participants in scholarly pursuits, reflecting a profound shift in the way science is conducted. This collaborative model demonstrates how public engagement enriches the scientific landscape, allowing for the inclusion of different perspectives and a wider range of voices in the pursuit of knowledge. Furthermore, engaging in participatory practices also involves elements of ‘phronesis’ [90] [see @mehlenbacher_expertise_2022], encompassing moral, affective, and care-oriented dimensions. Trust is also a foundational and indispensable element in the landscape of participatory initiatives [see @dahlgren_diversity_2020]. The success and sustainability of projects within Citizen Science heavily rely on establishing and maintaining trust among all stakeholders involved. This trust extends in multiple directions. First and foremost, participants must trust the project organisers and platforms that host these initiatives. They must have confidence that their contributions will be used responsibly and ethically, with respect to their time and effort. When contributors are assured that their involvement is valued and that the data they provide serves a meaningful purpose, their motivation to participate and provide accurate information is bolstered. Conversely, project organisers and institutions also need to instil trust in participants. Transparency in project objectives, methodology and data use is paramount. Clear and consistent communication is essential to address participants’ concerns and provide feedback on the impact of their contributions. This two-way trust is the foundation of successful participatory projects and facilitates long-term engagement. Citizen Humanities where members of the public can participate in activities such as crowdsourced transcription, tagging, and annotation of digital CH materials. These activities can help to uncover new knowledge and insights, as well as to make CH materials more accessible to a wider audience [@strasser_citizen_2018]. It is important to note that within the context of these terms, Citizen Science is often regarded as the broader concept, encompassing both Citizen Science and Citizen Humanities. While the primary distinction between the two may, in some cases, appear to be terminological, in practice, they both exemplify the principles of open and inclusive research, akin to the concepts of Open Science and Open Humanities discussed in the preceding subsection. These approaches foster collaboration and engagement between researchers and the public, deepening the public’s understanding and appreciation of the research process as a whole [@zourou_citizen_2022]. This inclusive perspective, even if those participatory activities have been more widely used in natural sciences than in the humanities [@lowry_is_2021], underscores Citizen Science as an umbrella term encompassing both scientific and humanities endeavours, each enriched by the active participation of the public. While Citizen Science involves the public in research, they differ from crowdsourcing projects in several ways. Crowdsourcing typically involves the outsourcing of tasks to a large group of people, often through online platforms, with the aim of completing a specific task or project [@ridge_crowdsourcing_2017]. In contrast, Citizen Science focuses more on engagement and collaboration, with the goal of involving the public in the research process and generating new knowledge. That being said, there is also a convergence between Citizen Science with crowdsourcing projects. In many cases, Citizen Science initiatives may also involve crowdsourcing tasks, such as collecting or annotating data. Similarly, crowdsourcing projects may involve elements of Citizen Science, particularly when they aim to engage the public in scientific or CH research [@ridge_5_2021]. For instance, @haklay_citizen_2013 [pp. 115-116] distinguish four categories or levels of participation in Citizen Science projects, each serving as a rung on the ladder of public engagement. The levels are as follows: Level 1. Crowdsourcing: In this level, citizens act as sensors and volunteered computing resources. Level 2. Distributed intelligence: Here, citizens serve as basic interpreters and volunteered thinkers. Level 3. Participatory science: At this stage, citizens actively participate in problem definition and data collection. Level 4. Extreme Citizen Science: In the highest level, citizens engage in collaborative science that encompasses problem definition, data collection, and analysis. When applied in the context of Citizen Humanities, public participation takes diverse forms. This involvement can encompass activities such as the public’s engagement in archaeological finds recording, as demonstrated by the Finnish Archaeological Finds Recording Linked Open Database (SuALT) project [@wessman_citizen_2019]. Another illustration is the case of the Citizen-Led Urban Environmental Restoration project where ‘young citizen scientists [in Jamaica and the United States] worked closely with museum scientists to restore two environmentally degraded urban sites’ [@commock_connecting_2023]. In terms of crowdsourcing of CH data or more broadly in the humanities, @owens_digital_2013 [p. 121] discusses two primary challenges associated with integrating the concept. He highlights that both the terms and pose certain problems. Successful crowdsourcing initiatives in libraries, archives, and museums, as he notes, typically do not rely on extensive crowds, and they are far from resembling traditional labour outsourcing endeavours. Furthermore, Owens emphasises that the central focus of such initiatives is not on amassing large crowds but rather on cultivating engagement and participation among individuals in the public who have a genuine interest. As Citizen Humanities broadens its scope to encompass a wider public engagement in DH and CH research, successful collaborations between DH and relevant research infrastructures have shown promising results [@fiser_boost_2018; @simpson_zooniverse_2014]. Furthermore, the integration of scientific and curatorial knowledge plays a pivotal role in CH and humanities studies, uncovering previously unknown contextual information within original materials [@france_integrating_2014]. As illustrated by institutions like the National Library of Estonia, the shift towards human-centred approaches and the development of DH services exemplify the expansion of Citizen Humanities [@andresoo_hundred-year-old_2018]. Incorporating user-generated or user-enhanced metadata still presents several challenges [@raemy_applying_2021]. One major challenge is ensuring the quality and consistency of the data. Another challenge is managing the large volume of data generated by users. With increasing numbers of participants and contributions, it can become difficult to process and organise the data in a way that is useful for research and for the broader public. As @dahlgren_diversity_2020 argue: Participatory metadata production has been valued for its potential to reduce the workload of the heritage institutions and make possible speedier digitization. However, in practice, little of the resulting metadata has been reinserted into the institutions databases and used in-house by information specialists. This challenge is compounded by the fact that user-generated metadata may be unstructured, making it more difficult to analyse and interpret. To address these challenges, it can be helpful to have a robust data curation strategy, maintained by a team that can communicate with participants on a regular basis, as well as tools and technologies that enable efficient data processing and analysis. LOD can also be a useful approach for organising and linking diverse sources of information, enabling researchers to incorporate different perspectives and opinions into their analysis. This form of participation often involves micro-tasks, akin to ‘puzzle-like’ tasks, connecting users closely with the subjects they are describing [see @ridge_enriching_2023]. The dynamics of participatory projects are intriguing and multifaceted. As expressed by @dahlgren_diversity_2020: [It] is the often tightly curated top–down design of crowdsourcing platforms where participation is wide in terms of numbers of participants but small in terms of what those participants are allowed to do. The second involves the preconception that the crowd per se, because of its sheer size, in some ways represents a diversity of perspectives and experiences, an idea which is often put forward as one of the benefits of participatory metadata production. Five recommendations outlined by [@ridge_recommendations_2023], specifically geared towards the CH domain, can serve as valuable guidance for various participatory endeavours. These recommendations encompass: Infrastructure: ensuring platform sustainability by supporting existing tools alongside new developments; Evidencing and Evaluation: creating an evaluation toolkit to emphasise impact and wider benefits; Skills and Competencies: establishing a self-guided skills assessment tool and workshops for upskilling; CoP: funding international knowledge-sharing events like informal meetups, low-cost conferences, and peer review panels, ensuring inclusivity beyond limited regional funding projects; Incorporating Emergent Technologies and Methods: providing support for educational resources and workshops to anticipate the opportunities and implications of emerging technologies. These recommendations offer a versatile framework that can be applied to various participatory efforts, transcending the boundaries of specific domains and promoting a more inclusive and effective approach to public engagement in research and collaborative initiatives. By adhering to these measures, Citizen Science projects can better flourish, fostering a collaborative and proficient community of practitioners. Rather than creating new infrastructure, research projects should leverage and extend existing ones, such as Zooniverse[91], a generic Citizen Science portal and FromThePage[92], a transcribing platform. In summary, both Citizen Science and Citizen Humanities represent participatory methods of inquiry. While they have gained popularity, critical discussions regarding their potential limitations, notably in terms of diversity, are integral to their ongoing development. These critical discussions encompass issues like the challenge of addressing notions of volunteer — thus unpaid – labour, lack of diversity, and countering the dominance of traditional, often exclusive scientific practices [see @stengers_another_2018]. These conversations serve as essential drivers for the evolution of participatory approaches, prompting a reevaluation and refinement of their methodologies to ensure greater inclusivity and equity [@lewenstein_is_2022]. 3.3.2.3 FAIR Data Principles The FAIR data principles[93] were developed to ensure that three types of entities – namely data, metadata, as well as infrastructures – are Findable, Accessible, Interoperable, and Reusable. The four key principles of FAIR and their underlying 15 sub-elements or facets are as follows [@wilkinson_fair_2016]: F. Findable — (Meta)data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services. F1 (Meta)data are assigned a globally unique and PID F2 Data are described with rich metadata (defined by R1) F3 Metadata clearly and explicitly include the identifier of the data they describe F4 (Meta)data are registered or indexed in a searchable resource A. Accessible — Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorisation. A1 (Meta)data are retrievable by their identifier using a standardised communications protocol A1.1 The protocol is open, free, and universally implementable A1.2 The protocol allows for an authentication and authorisation procedure, where necessary A2 Metadata are accessible, even when the data are no longer available I. Interoperable — The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. I1 (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2 (Meta)data use vocabularies that follow FAIR principles I3 (Meta)data include qualified references to other (meta)data R. Reusable — The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. R1 (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1 (Meta)data are released with a clear and accessible data usage license R1.2 (Meta)data are associated with detailed provenance R1.3 (Meta)data meet domain-relevant community standards Originally introduced to improve data management and sharing in the life sciences, the FAIR principles have evolved into a widely adopted framework that transcends research disciplines. They have been adopted in a wide range of fields, including astronomy [@otoole_fair_2022], genomics [@corpas_fair_2018], environmental science [@crystal-ornelas_enabling_2022] and the humanities. In particular, FAIR principles have been applied to make historical archives, artworks or linguistic datasets more openly available for human users and search engines. Moreover, CHIs have embraced FAIR principles as guidelines and best practices, employing them in the deployment of repositories, virtual research environments or data platforms [@hahnel_how_2020; @beretta_challenge_2021]. Yet, the concept of FAIR data management practices in the humanities is not always straightforward as demonstrated by @gualandi_what_2022 at the University of Bologna in Italy. The study, involving 19 researchers from the Department of Classical Philology and Italian Studies were interviewed to investigate the concept of ‘data’ in the humanities, particularly in relation to the FAIR principles. The study identified 13 types of research data based on participant input, such as publications, primary sources (manuscripts, artworks), digital representations of CH resources, but also websites, events, or standards. Thus, suggesting that within FAIR, should encompass all inputs and outputs of humanities research. The research also emphasised the importance of methodologies and collaboration in managing research effectively, emphasising the need for clarity and consensus in applying FAIR data principles within the field. Indeed, implementing FAIR can be complex due to the variety of data types and existing practices. Such complexity requires structured methods. As [@jacobsen_fair_2020 p. 11] point out, FAIR is not a standard, it is a guide that needs implementations based on interpretations. Similarly, @dunning_are_2017 [pp. 187-188] emphasise the multifaceted nature of the FAIR Data Principles and the need to view compliance as an aspirational objective. Their research reveals challenges in achieving full compliance, with specific difficulty in the Interoperable and Re-usable facets. They advocate for basic policy implementation in areas like PID, metadata, licensing, and protocols, alongside transparent documentation. Additionally, the authors stress the importance of using HTTPS – an extension of HTTP (see 3.4.1) – to ensure secure data transmission and accessibility. Finally, the authors stress the importance of collaboration between (data) archivists and researchers. @go_fair_fairification_2016 – an initiative that aims to implement the FAIR data principles – outlined a seven-step FAIRification process, which includes essential stages in the transformation of data, as illustrated by Figure 3.11. These steps begin with the retrieval of non-FAIR data, followed by in-depth analysis to understand the content and structure of the data. The process then requires the creation of a semantic model that accurately defines the meaning and relationships of the data in a computational way, often involving the integration of existing ontologies and vocabularies. The fourth step involves linking data through the application of Semantic Web technologies, thereby improving interoperability and integration with disparate data sources. In addition, the assignment of a clear licence is highlighted as a separate step, emphasising its key role in enabling data reuse and open access. As a sixth step, metadata need to be assigned in order to support data discovery and access. Finally, the FAIRified data is deployed or published with its associated metadata and licence, ensuring that it can be accessed and discovered by search engines, even if authentication and authorisation requirements are in place. As a result, the new FAIR dataset can now be more conveniently aggregated with other data sources, making it more straightforward to raise research questions across multiple sources. Figure 3.11: The FAIRification Process. Adapted from @go_fair_fairification_2016 A further illustration of how FAIR can be deployed is the conceptualisation of the FDO, which includes a strong binding of various types of metadata [@schultes_fair_2019 pp. 7-9]. The members of the @european_commission_directorate_general_for_research_and_innovation_turning_2018 [p. 35] underline that the establishment of a FAIR-compliant ecosystem hinges on the FDO concept, an implementation framework to develop scalable cross-disciplinary capabilities. As illustrated in Figure 3.12, data must be assigned PIDs and accompanied by detailed metadata to ensure reliable discoverability, usability and citation. They also argue to use widely accepted file formats and adhering to community-specific metadata standards as well as vocabularies to support interoperability and reuse. Figure 3.12: The FDO Model [@european_commission_directorate_general_for_research_and_innovation_turning_2018] @soiland-reyes_updating_2022 highlight the potential of LOD to drive the adoption of FDO within research infrastructures. While this approach provides specifications and tools, the proliferation of standards and metadata vocabularies poses challenges to interoperability and implementation. To address these hurdles, the authors present the use of FAIR Signposting[94], which enables straightforward navigation to core FDO properties, without the need for complex content negotiation heuristics. In summary, the FAIR data principles comprise four key principles or 15 facets that provide a comprehensive framework for data management and sharing. While processes are in place to facilitate their implementation, the path to FAIRness can be complex, with interoperability and compliance challenges. A key element is the thoughtful mapping of different metadata standards and the strategic incorporation of Linked Data technologies. The FDO approach is equally relevant to the CH sector, supporting the preservation, accessibility and sharing of CH data and resources. The sharing of code, accompanied by comprehensive documentation, also enhances such an ecosystem by facilitating the exchange of valuable technical knowledge and resources. 3.3.2.4 CARE Principles for Indigenous Data Governance The CARE[95] Principles were developed to protect Indigenous data sovereignty [@carroll_care_2020] as complementary guidelines to FAIR. The principles are as follows: C. Collective Benefit: Data should be collected and used in a way that benefits the community as a whole, rather than just individuals or organisations. A. Authority to Control: Indigenous communities should have control over their own data, including how it is collected, stored, and used. R. Responsibility: Those who collect and use Indigenous data have a responsibility to ensure that it is used ethically and responsibly. E. Ethics: Indigenous data governance should be guided by ethical principles that reflect the values and beliefs of the community. The concept of adhering to the CARE principles is vital for promoting equitable data practices. CARE are built upon existing data reuse principles like FAIR but also integrate the efforts of Indigenous-led networks focused on Indigenous data governance and research control. While FAIR emphasise data accessibility, CARE go beyond that by considering actions aligned with the needs and intentions of individuals and communities connected to the data [@carroll_operationalizing_2021]. By embedding CARE-informed data practices into project design, the ethical and responsible use of Indigenous data can be enabled to improve inclusive policies and services [@robinson_caring_2021]. 3.3.2.5 Collections as Data In the same vein of FAIR and CARE should be mentioned the that originated from a meeting of GLAM practitioners in Vancouver, Canada in April 2023 that builds on the [96] done in 2017 [@padilla_always_2017]. The statement highlights the growing global engagement with collections as data. It promotes the responsible computational use of collections to empower memory, knowledge and data practitioners. It emphasises ethical concerns, openness and participatory design, as well as the need for transparent documentation and sustainable infrastructure. The statement, comprising of ten recommendations, also recognises the potential impact of data consumption by AI, and the importance of considering climate impacts and exploitative labour [@padilla_vancouver_2023]. More specifically, the following ten principles have been established for anyone with (meta)data stewardship responsibilities: Collections as Data development aims to encourage computational use of digitised and born digital collections. Collections as Data stewards are guided by ongoing ethical commitments. Collections as Data stewards aim to lower barriers to use. Collections as Data designed for everyone serve no one. Shared documentation helps others find a path to doing the work. Collections as Data should be made openly accessible by default, except in cases where ethical or legal obligations preclude it. Collections as Data development values interoperability. Collections as Data stewards work transparently in order to develop trustworthy, long-lived collections. Data as well as the data that describe those data are considered in scope. The development of collections as data is an ongoing process and does not necessarily conclude with a final version. In a final report, @padilla_collections_2023 underscore the transformative potential of the Collections as Data paradigm, particularly in the context of GLAMs. The principles and case studies highlighted in the report offer a roadmap for organisations to responsibly and ethically engage with their collections in the digital era. It is imperative to recognise that the journey towards fully realising its potential is ongoing and requires a commitment to continual evaluation and adaptation. This involves not only adhering to established principles but also being responsive to emerging technological trends, societal changes, and evolving ethical considerations. The role of AI in shaping the future of Collections as Data is particularly noteworthy. As AI continues to advance, it offers both opportunities and challenges in terms of enhancing access and insights into collections while also necessitating careful consideration of ethical implications, such as bias and privacy. Furthermore, the growing emphasis on climate impacts and sustainable practices in data stewardship is a crucial aspect that aligns with global efforts towards environmental responsibility. Building on the discussion of the principles and initiatives surrounding Collections as Data, an in-depth analysis was carried out to assess the compliance of repositories, projects and platforms from six organisations with the checklist, namely the British Library[97], the National Library of Scotland[98], LoC[99], the Royal Danish Library[100], Meemoo[101], and the Miguel de Cervantes Virtual Library[102] [@candelaChecklistPublishCollections2023a p. 13]. Although several institutions have opened access to their collections through APIs, such as IIIFs capabilities, challenges remain in fully embracing the Collections as Data principles. Barriers include resource limitations and the balance between making collections widely available through simplified access and downloads. In addition, different items within the checklist may require different levels of maturity and prioritisation, often requiring collaborative efforts. Initial results show that the checklist is a valuable tool for identifying relevant issues for individual institutions, although prioritisation may vary according to context and user needs. Collaborative initiatives between institutions are underway to improve the practical implementation and user experience, particularly in the structuring of datasets [@candelaChecklistPublishCollections2023a pp. 20-21]. While there are still relatively few examples of institutions that have fully adopted the Collections as Data principles, several case studies – such as at the Royal Library of Belgium, which is materialised through DATA-KBR-BE[103] [see @chambers_collections_2021] – and initiatives offer valuable insights. For instance, @candelaChecklistPublishCollections2023a [p. 7] outline a checklist tailored to GLAM institutions to publish Collections as Data[104]. They devised 11 criteria, including the provision of clear licensing for dataset reuse without restrictions, citation guidelines, comprehensive documentation, the use of public platforms, sharing examples of dataset use, structuring the data, providing machine-readable metadata, participation in collaborative edition platforms, offering API access to the repository, developing a dedicated portal page, and defining clear terms of use. These recommendations serve as a structured framework to enhance accessibility, usability, and interoperability, fostering engagement with cultural and historical collections. A further notable advancement that has been done in the area of publishing Collections as Data is the contribution of @alkemade_datasheets_2023. They have outlined a series of recommendations for developing datasheets or modular templates designed for CH datasets. This initiative holds significant importance for GLAMs, facilitating the structured organisation of their data, notably for seamless integration with ML tools where they propose to provide a description of how content have been influenced by digitisation. Their work highlights the need for documentation, focusing on tailored metrics, biases, and system integration. The proposed datasheets aim to detail the creation, selection, and digitisation processes, enhancing transparency and addressing the distinctive challenges of digital CH data. Emphasising a narrative approach to articulating biases, the author acknowledges the complex historical context and ethical implications. 3.4 Open Web Platform and Linked Data The web, created at the CERN in 1989 by Tim Berners-Lee[105], has enabled scholars and CH practitioners to access and analyse vast amounts of data in new ways, thereby opening the door to the creation of federated datasets and KGs. At the heart of this transformation are two pivotal concepts: the Open Web Platform and Linked Data. The Open Web Platform refers to a set of technologies and standards that allow for the creation and sharing of content on the web. Linked Data, on the other hand, refers to a set of principles and technologies that enable the publication and interlinking of data on the web, creating a web of data that can be easily navigated and used by humans and machines alike. Recognising the Web as an environment that supports a wide range of applications beyond traditional browser-based interactions is becoming increasingly important. Platforms such as social networks like Facebook, Twitter, Instagram, and Mastodon, streaming services such as Netflix and Disney+, as well as cloud-based applications, all leverage web technologies even when not accessed via a traditional web browser. These platforms are integral to the web ecosystem, highlighting the web’s role as a foundational platform for diverse digital interactions and data exchanges. In the context of the Internet, it is important to note that much of what we know today about it is the result of developments by many individuals and organisations. However, a significant milestone was the development of the TCP/IP protocol by Vinton Cerf and Robert E. Kahn in the 1970s [see @cerf_protocol_1974]. This protocol became the standard networking protocol on the ARPANET in 1983, marking the beginning of the modern Internet [@leiner_past_1997]. Understanding the differentiation between the Internet and the web is crucial. The former is a global network of interconnected computers that communicate using Internet protocols, forming the infrastructure that enables online communication. The web, or World Wide Web, is a service built on top of the Internet, leveraging HTTP to transmit data. While the Internet provides the underlying connectivity, the web offers a way to access and share information through websites and links. This differentiation is vital in comprehending how the web, as a part of the Internet, has evolved into a versatile and ubiquitous platform supporting a wide array of applications. This section, divided into five subsections, explores some of the key concepts underlying the Open Web Platform and Linked Data, and their applications in the CH field. First, 3.4.1 examines the foundational principles and technologies that underpin the Open Web Platform. This includes an overview of principles, protocols such as HTTP, and the use of URIs to identify resources on the web. This part also explores the different types of web architectures such as the client-server model or the concept of web services, which allow for the exchange of data and functionality across different applications and systems. 3.4.2 explores the vision of the web as a giant, interconnected database of structured data that can be queried and manipulated by machines. The subsection examines the technologies and standards that make up the Semantic Web, including RDF, RDFS, OWL, and SPARQL. Subsection 3.4.3 examines the set of principles designed to promote the publication and interlinking of data on the web. The subsection explores the four principles of Linked Data - using URIs to identify resources, using HTTP to retrieve resources, providing machine-readable data, and linking data to other data. Subsection 3.4.4 examines the set of criteria for publishing data on the web in a way that makes it easily discoverable, accessible, and usable. The subsection describes the Five-Star and Seven-Star deployment schemes, which include criteria such as providing data in a structured format, using open standards, and providing a machine-readable license. Finally, 3.4.5 explores the specific application of LOD in the CH domain. The subsection provides examples of how CHIs such as museums, libraries, and archives are using Linked Data to make their datasets more accessible and discoverable on the web. Overall, this section provides a comprehensive overview of the key concepts and technologies underlying the web as an open and linked platform, and their applications in the CH field and more broadly for any scientific endeavours as that the web started with [@nelson_d-lib_2022 citing @berners-lee_worldwideweb_1991]. Through exploring these concepts, we can gain a deeper understanding of how the web is evolving into a more open, interconnected, and data-driven platform, and how this evolution is transforming the way we access, use, and share information. 3.4.1 Web Architecture The web architecture has played a very important role in the development of scholarly research and CH practices, enabling new forms of collaboration, data sharing and interdisciplinary research. By providing a standardised and interoperable framework based on open standards for sharing and accessing data [@berners-lee_long_2010], it has facilitated the open exchange of information, even if citation, i.e. has always been an issue, particularly for scholarly outputs [@lagoze_web-based_2012 p. 2223]. Web architecture is a conceptual framework led by the W3C that underpins and sustain the World Wide Web [@jacobs_architecture_2004], created to be [@berners-lee_world-wide_1994]. It encompasses the architectural bases of identification, interaction, and format – also referred to as representation where HTTP provides the technical mechanisms for transmitting and accessing information. The web architecture is based on a set of identifiers, such as URIs, which are used to uniquely identify resources on the web. These identifiers play a crucial role in enabling users to find, access, and share information on the web, and they help to ensure that web-based systems are both user-friendly and interoperable. Here, it is valuable to distinguish between three key concepts: URI, URL, and URN, as a URI can be further classified as a locator, a name, or both [@berners-lee_uniform_2005 p. 7] – as shown in Figure 3.13. URI: It is the overarching term encompassing both URLs and URNs. It serves as a generic identifier for any resource on the web. URIs can be used to uniquely identify resources, regardless of the specific naming or addressing scheme employed. URL: It is a subset of URIs and refers to web addresses that specify not only the resource’s identity but also its location or how to access it. URLs often include the protocol (such as HTTP) and the resource’s specific location (e.g., a domain and path). URN: It is another subset of URIs that emphasise the resource’s identity rather than its location or how to access it. URNs are designed to be persistent and unique, making them suitable for resources that are intended to be recognised and referenced over time. While URLs may change as resources move or evolve, URNs should remain constant. Figure 3.13: Overlap and Difference between URI, URL, and URN Interaction between web agents, i.e. a person or a piece of software acting in the information space on behalf of a person, entity or process, over a network involves URIs, messages and data. Web protocols, such as HTTP, are message-based. Messages can contain data, resource metadata, message data, and even metadata about the metadata of the message, typically for integrity checking [@jacobs_architecture_2004]. The Web Architecture allows for multiple Representations of a Resource. In this context, a data format specification becomes pivotal, encapsulating an agreement on how to correctly interpret the representation of data, as articulated by [@jacobs_architecture_2004]: A data format specification embodies an agreement on the correct interpretation of representation data. The first data format used on the Web was Hypertext Markup Language (HTML). Since then, data formats have grown in number. Web architecture does not constrain which data formats content providers can use. This flexibility is important because there is constant evolution in applications, resulting in new data formats and refinements of existing formats. Although Web architecture allows for the deployment of new data formats, the creation and deployment of new formats (and agents able to handle them) is expensive. Thus, before inventing a new data format (or “meta” format such as XML), designers should carefully consider re-using one that is already available. Access can also be mediated by content negotiation, which is a mechanism employed in web communication to determine the most appropriate representation of a resource to be sent to a client based on the client’s preferences and the available representations [@lagoze_web-based_2012 pp. 2223-2224]. At its core, web architecture is based on a set of architectural principles that guide the design and development of web-based systems and applications. These principles include concepts such as orthogonality, extensibility, error handling, and protocol-based interoperability. Orthogonality allows the evolution of identification, interaction, and representation independently. Extensibility is key, enabling technology to adapt without compromising interoperability. Error handling addresses diverse errors, from predictable to unpredictable, ensuring seamless correction. Finally, the web’s protocol-based interoperability fosters communication across varied contexts, outlasting entities and facilitating the longevity of shared technology [@jacobs_architecture_2004]. Overall, these principles help to ensure that the web remains robust, reliable, and flexible. Web architectures can be categorised into several types, each offering a specific approach to designing and structuring web-based systems. Here, I will focus on the following three types of architectures – shown in Figure 3.14: the client-server model, the three-tier model, and SOA. Figure 3.14: Types of Web Architectures: Client-server Model, Three Tier Model, SOA The client-server model partitions the responsibilities between two key components: the client, which represents the user interface or user-facing part of the system, and the server, which is responsible for storing and serving data. In this model, clients and servers communicate to perform various functions, such as requesting and delivering information [@oluwatosin_client-server_2014 pp. 67-68]. The three-tier model is another significant web architecture that introduces an additional layer between the client and server, resulting in a three-part structure. This architecture is designed to further segregate and manage the system’s components [@wijegunaratne_three-tier_1998 pp. 41-42]. The three tiers typically consist of the presentation tier (the user interface), the application tier (responsible for logic and processing), and the data tier (where data storage and retrieval occur). SOA is a web architecture that emphasises the creation and utilisation of services as the central building blocks of a system. Services in this context are self-contained, modular units of functionality that can be accessed and used independently by various components of a web application. These services are designed to be loosely coupled, meaning they can interact with other services without a deep dependency on one another. Overall, ‘SOA is a paradigm for organizing and packaging units of functionality as distinct services, making them available across a network to be invoked via defined interfaces, and combining them into solutions to business problems.’ [@laskey_service_2009 p. 101]. SOA can encompass various communication protocols, such as REST, which is a prominent architectural style for designing networked applications [@fielding_architectural_2000], primarily leveraging HTTP. RESTful services, i.e. applications that complies with the REST constraints, are designed to work with existing capabilities rather than creating new standards, frameworks and technologies [@battle_bridging_2008 p. 62]. These services are built around a set of constraints, including statelessness, a uniform interface, resource-based identification, and the use of standard request methods such as GET, POST, PUT, and DELETE [@tilkov_brief_2017]. The following are all the specified request methods enabling clients to perform a wide range of operations on resources[106] [@fielding_http_2022 p. 72]: GET: Transfer a current representation of the target resource. HEAD: Same as GET, but only transfer the status line and header section. POST: Perform resource-specific processing on the request payload. PUT: Replace all current representations of the target resource with the request payload. DELETE: Remove all current representations of the target resource. CONNECT: Establish a tunnel to the server identified by the target resource. OPTIONS: Describe the communication options for the target resource. TRACE: Perform a message loop-back test along the path to the target resource. RESTful services, with their emphasis on using standardised HTTP methods and resource-based identification, offer a versatile means of designing web services and APIs. Their simplicity and compatibility with the web’s core protocols make them a practical choice for implementing various web-based applications. In the context of the Semantic Web, RESTful services can serve as a crucial component for accessing and exchanging graph data [see @lee_learning_2011]. 3.4.2 The Semantic Web The Semantic Web is [@berners-lee_semantic_2001 p. 35]. It was already in [@berners-lee_realising_1999]'s vision and prediction that the web, in its next phase, could be understood by machines, i.e. shifting from a traditional web of documents to a web of data. @bauer_linked_2012 [p. 25] articulates that ‘[t]he basic idea of a semantic web is to provide cost-efficient ways to publish information in distributed environments. To reduce costs when it comes to transferring information among systems, standards play the most crucial role.’. At the heart of the Semantic Web lies the foundation of RDF. The original RDF specification, known as the RDF Model and Syntax, serves as the underlying mechanism that establishes the basic framework of RDF. This framework provides the cornerstone to facilitate the exchange of data among automated processes [@lassila_resource_1999]. A fundamental component within RDF is the RDF triple as shown in Equation 3.1, comprising three essential elements: the subject (s), the predicate (p), and the object (o). In an RDF triple, the subject is the resource or entity about which a statement is made, the predicate is the relationship or property describing that statement, and the object is the value or resource associated with the statement. s → p o Equation 3.1: Triple Pattern Notation RDF statements are reminiscent of the semiotic triangle of @ogden_meaning_1930 [p. 11] — as illustrated in Figure 3.15 — where the referent is tantamount to the predicate of a triple. This analogy emphasises the intrinsic relationship between communication, representation and knowledge organisation. It highlights how both language and structured data rely on the establishment of connections and relationships to effectively convey meaning. Figure 3.15: The Semiotic Triangle by [@ogden_meaning_1930]. Figure 3.16 is an RDF graph about myself and where I was born leveraging mostly Schema.org[107], a collaborative project and Linked Data vocabulary used to create structured data markup on websites. This graph consists of vertices and edges, where vertices can be either URIs or literal values, and the edges represent relationships between them. In plain language, the graph asserts that there is a person represented by the URL https://www.example.org/julien-a-raemy, who has a given name ‘Julien Antoine’ and a family name ‘Raemy’. The person’s birthplace is specified as an URL from Wikidata, which is of type schema:Place. Additionally, there’s a statement indicating that the birthplace is named ‘Fribourg’. Figure 3.16: Example of an RDF Graph In the subject-predicate-object syntax of RDF, the subject can be either a URI or a blank node[108]. The predicate is an URI, like schema:givenName, and its aim is to establish connections between subjects and objects, describing the nature of the relationship. The object is either an URI, a blank node or a literal, such as or . Objects can also act as subjects if they are identifiable, allowing for the expansion and interconnection of RDF graphs. The original specification proved too broad, leading to confusion and a subsequent effort yielded an updated specification and new documents such as RDF/XML [@beckett_rdfxml_2004], which express an RDF graph as XML, a syntax specification recommendation in 2004 and later revised in 2014 as part of the RDF 1.1 document set [@gandon_rdf_2014], which also introduces the notion of an RDF dataset that can represent multiple graphs [@cyganiak_rdf_2014]. is the RDF/XML serialisation of the earlier graph. (…) @idehen_semantic_2017 highlights a significant concern regarding the earlier representations of the Semantic Web and how it is portrayed. These portrayals often place undue emphasis on the pivotal role of XML as an ostensibly obligatory component in Semantic Web development. To him, this historical perspective, particularly prominent around the year 2000, erroneously positioned XML as a superior alternative to HTML for constructing the Semantic Web. As illustrated by Figure 3.17, @idehen_semantic_2017’s revision embodies a Semantic Web layer cake that encompasses several technical or conceptual components. Smart Applications and Services: These systems are constructed declaratively, as opposed to using an imperative approach, with a flexible integration of data models, interaction, and visualisation. Trust: It is established through verifiable claims regarding identity, the source of content, and related issues. Proof: It provides a basis for building trust, such as leveraging authentication tiers. Transmission Security: It pertains to safeguarding data during its transit over networks. This protection is achieved by implementing established protocols, such as TLS, which includes inherent cryptographic support for ensuring the confidentiality and integrity of data as it travels across communication channels. Unifying Logic: In this component, FOL[109] assumes a central role by providing the fundamental schema for modelling and comprehending data. These propositions serve as the core building blocks for problem-solving and decision-making, offering a universal and abstract structure that can be applied across various domains and applications[110]. Rules: They serve as the foundation for conducting reasoning and inference, such as using SHACL, which is used to specify integrity constraints for data entry when constructing structured data with RDF. At this level, mapping languages such as R2RML and RML have a part to play. R2RML is a powerful language for expressing customised mappings from relational databases to RDF datasets [see @das_r2rml_2012]. On the other hand, RML extends these capabilities beyond relational databases, allowing for the transformation of various types of data sources, including CSV, XML, and JSON files, into RDF [see @dimou_rdf_2022]. Both languages are instrumental in bridging the gap between non-RDF data sources and the Semantic Web. Query: This can be achieved by employing SPARQL for retrieving and manipulating data on structured RDF statements. Dictionaries: Collections of formal definitions (also referred to as vocabularies or ontologies) that describe entity and entity relationship types. They provide a structured framework for defining and organising concepts, properties, and relationships, which aids in modelling knowledge in a systematic and machine-readable manner. Some of the commonly used dictionary languages include RDFS, which is a simple vocabulary language used to define basic schema information for RDF, SKOS designed for representing and organising some KOSs, particularly in the context of thesauri and taxonomies, as well as OWL. Abstract Language: It is done by leveraging the RDF syntax (as shown in Equation 3.1) as a basis. Sentence Part Identifiers: To identify resources on the web, IRIs[111] or URIs can be used. Document Types: Different serialisations of RDF exist within the Semantic Web, such as RDF/XML, Turtle or JSON-LD[112]. Semantic Web of Linked Data: The final component that holds the rest together; originating from Tim Berners-Lee’s vision (see ). Figure 3.17: Tweaked Semantic Web Technology Layer Cake by @idehen_semantic_2017 Following on from the components outlined in Figure 3.17, I will look in more detail at further RDF features, serialisations, and RDF-based standards for representing, querying or validating graphs. In doing so, I will touch on some considerations related to the inference and reasoning of RDF graphs. Code Snippet 3.5 is a Turtle serialisation of Figure 3.16. Turtle, a W3C standard, is a notation is a way to express this data in a structured and machine-readable format [@wood_linked_2014 p. 44]. It is a common syntax used for representing RDF data. It allows people to create statements in a more friendly manner than in an RDF/XML serialisation [@beckett_rdf_2014]. Here are some of the most important features: (…) 3.4.3 Linked Data Principles 3.4.4 Deployment Schemes for Open Data 3.4.5 Linked Open Data in the Cultural Heritage Domain (…) 3.5 Linked Open Usable Data (…) 3.6 Characterising Community Practices and Semantic Interoperability (…) 3.7 Summary and Preliminary Insights This section provides a summary of what has been discussed in this literature review as well as some preliminary insights with regard to the LOUD ecosystem, chiefly the design principles, communities, standards, and the implementations. It follows the flow of the present chapter and is organised into five subsequent parts. Finally, in 3.7.7, I end with a few reflections on why we ought to care about CH data in the wider sense. 3.7.1 Cultural Heritage Data CH data are unique and require different — or adjacent – methods of analysis than quantitative scientific data. The diversity of CH data, including tangible, intangible, and natural heritage, presents a challenge in preserving and promoting these resources effectively. Representing CH data in digital form can also be challenging, as it may lead to a loss of context and complexity. To address these challenges, a comprehensive understanding of the various types of heritage resources, their meanings, and values is necessary, along with an effective preservation and valorisation strategy that considers their heterogeneity. For this, I find that we need a vision that takes into account the diversity of human and non-human entities, entangled in community-based socio-technical activities, whether local, interdisciplinary or global. The following provides a concise summary of the key aspects explored in the first section, addressing the distinctive nature of CH data. The dimensions of CH data: I argue that three key dimensions need to be taken into consideration for CH data: heterogeneity, knowledge latency, and custodianship. Representing and embodying CH data: I discuss that the faithful recreation of materiality in digitised and digital-born resources appears to be a significant issue and that it needs to be treated as new types of affordances or surrogates. Moreover, while achieving a high degree of digital material fidelity may be challenging, it is important to acknowledge that accessibility remains a paramount concern in this context. Collectives and apparatuses are entangled: Actors, encompassing individuals, institutions, local and global communities, along with the (digital) infrastructures and their components, are deeply interwoven. These human and non-human entities collaboratively shape and navigate the complex networks of interactions and technologies that form society. Within this context, ANT emerges as a valuable theoretical framework for understanding these close relationships and interdependencies. 3.7.2 Cultural Heritage Metadata A variety of different metadata standards have emerged, tailored to different functions and purposes, with some dating back to the 1960s, particularly in library settings. However, the majority of other traditional CH metadata standards have emerged between the 1990s and the 2000s. In terms of descriptive metadata standards and conceptual models, MARC, ISAD(G), CDWA, CIDOC-CRM are being supplemented – or gradually substituted – by models and profiles like LRM, RiC, and Linked Art, which – to some extent – align with Linked Data principles, enabling richer relationships and enhancing access to vast and diverse collections. These evolving metadata standards support a more interconnected, open, and accessible information landscape, enabling researchers, practitioners, and the public that should hopefully navigate and explore CH resources more effectively[113], especially when they are used in conjunction with controlled vocabularies, facilitating cross-domain reconciliation. As for aggregation of CH (meta)data, alternative web-based technologies to OAI-PMH appear to be viable options, using AS in particular. 3.7.3 Trends, Movements, and Principles CH data is intrinsically intertwined with broader trends including Big Data and AI, serving as both a reflection and a product of these influences. Yet it is vital to recognise that CHIs and DH practitioners are – or should be – instrumental in shaping and curating such data. They not only shape and curate them, but also inform and drive the very trends that permeate across our data landscape. Their values, actions and methodologies represent critical drivers in the process, guiding the collection, preservation, and interpretation of CH data. The scientific movements and guiding principles of Open Science/Open Scholarship, Citizen Science/Citizen Humanities, FAIR, CARE, and Collections as Data collectively permeate and shape CH data. While the first two represent dynamic movements actively involving scholarly practices or the public, the other three are more concerned with the system in which data is situated. LOUD design principles and standards, which are more content-centric, bolster the effectiveness of these principles and movements, fostering open and accessible CH data through their real-world applicability. 3.7.4 Linked Open Data The Open Web Platform and Linked Data are foundational to the evolution of scholarly research and CH practices, enabling the creation of federated datasets and KGs. The section explores these concepts and their applications in the CH domain. The web architecture, underpinned by principles, protocols, and identifiers like URI, URL, and URN, facilitates the exchange of data and functionality across applications and systems. It emphasises architectural principles such as orthogonality and protocol-based interoperability and explores various web architectures, including the client-server model and SOA. The Semantic Web, a vision for a web understood by machines, relies on standards like RDF, RDFS, OWL, and SPARQL. The limitations of RDF in complex KR can be mitigated to some extent with RDF reification and quoted triples. SHACL is also introduced for data graph validation. Linked data principles promote the publication and interlinking of data on the web, creating a web of data that is navigable and usable. Challenges in Linked Data implementation include GUIs, application architectures, schema mapping, link maintenance, licensing, trust, quality, relevance, and privacy, enhancing the web’s potential as an open, interconnected platform. Deployment schemes like the Five-Star and Seven-Star models provide criteria for publishing open data. These models address the clarity, usability, and applicability of open data, emphasising schema documentation and data quality. Finally, the application of LOD in the CH sector is explored through examples like Europeana. Despite its potential in improving data quality and visibility, challenges persist, including issues related to cataloguing, adoption of new standards, and the complexity of Linked Data terminology. The section underscores the need for collaboration and community-driven practices for effective LOD implementation. 3.7.5 LOUD: Design Principles, Communities, Standards LOUD focuses on improving data accessibility primarily for software developers. It balances data completeness and practical considerations like scalability and usability. Its design principles are: Right Abstraction for the Audience: Tailoring data access to specific user needs. Few Barriers to Entry: Simplifying initial engagement with the data. Comprehensibility by Introspection: Ensuring data is largely self-explanatory. Documentation with Working Examples: Providing clear guidelines and practical use cases. Consistent Patterns Over Exceptions: Reducing complexity through uniform patterns. The systematic review of LOUD in scholarly literature employed the weight of evidence framework. A Boolean query identified relevant papers, with findings showing 46 relevant references from 2018-2023, mainly in English. These papers were categorised into four main categories: mentions of LOUD, descriptions of LOUD, explanations of LOUD design principles, and comparative analyses where theLOUD principles have been reused in various applications. LOUD integrates technologies, mostly community-driven, like IIIF, WADM, and Linked Art. IIIF facilitates the sharing of high-resolution images and audiovisual content through a series of specifications, WADM provides a standard for creating and sharing annotations across various platforms, and Linked Art provides a model and an API specification for semantically describing CH. Together, they demonstrate a transformative potential in how CH data is interacted with and understood, reshaping traditional humanities and opening new research opportunities. 3.7.6 Community Practices and Semantic Interoperability In the exploration of community dynamics and the intricacies of data exchange, two axes or perspectives come into focus in my PhD: community practices and semantic interoperability. These axes represent not just isolated concepts but comprehensive frameworks that influence how communities function and thrive in an interconnected world. Community practices, as shared activities and rituals, weave the fabric of collective identity within communities, while semantic interoperability acts as the bridge for meaningful and truthful data exchange between different systems. Both community practices and semantic interoperability will permeate the empirical parts of the thesis as I explore LOUD for CH through different prisms. These axes serve as critical lenses through which we can dissect and analyse the dynamics, challenges, and opportunities that arise within this landscape. 3.7.7 The Case for Cultural Heritage Data One thing that has been partially touched upon but not strongly asserted in this chapter is ‘why do we really care about cultural heritage?’ The importance of CH data as primary and secondary sources for DH practitioners transcends the mere definition, interlinking, and preservation of these sources. The undertaking, notwithstanding technological assistance, can be inherently challenging, contingent upon numerous interdependencies. Moreover, it is deeply rooted in our response to many pressing global challenges, including the far-reaching consequences of climate change largely caused and accelerated by human activities. In an era characterised by profound disruptions, exemplified by events such as heatwaves, fires, droughts, floods, rising sea levels, and the resultant migrations driven by these environmental changes, which affect not only humans but the entire biosphere, it becomes increasingly important to emphasise the societal responsibilities that accompany our engagement with CH as well as DH practices in the Anthropocene [see @nowviskie_digital_2015]. 4. Exploring Relationships through an Actor-Network Theory Lens As Jim Clifford taught me, we need stories (and theories) that are just big enough to gather up the complexities and keep the edges open and greedy for surprising new and old connections. [@haraway_staying_2016 p. 101] This chapter serves as the theoretical framework of the dissertation, and its primary goals are to elucidate the theoretical underpinnings and provide a comprehensive toolbox for addressing the identified problem. In the preceding literature review chapter, I highlighted the issue that necessitates attention around interlinking CH. The theoretical framework, sometimes referred to as the ‘toolbox’, which can be likened to ‘tools’ the that will be employed to understand and address this problem. Here, the primary purpose of this chapter is to offer an in-depth exploration of the tools – which comprises various theories, propositions, and concepts – delineating their characteristics, behaviours, historical applications, interrelationships, relevance to the study’s objectives, and potential limitations. Subsequently, the next chapter will elucidate how these tools will be operationalised in the research process. The theoretical framework of this study is firmly rooted in ANT, which will be pursued systematically throughout the research. ANT is a constructivist approach that seeks to elucidate the fundamental dynamics of societies. Unlike traditional perspectives that restrict the concept of an ‘actor’ – or ‘actant’ – to individual humans, ANT expands this notion to encompass non-human and non-individual entities [see @callon_actor-network_1999; @callon_actor_2001; @latour_actor-network_1996; @latour_reassembling_2005]. ANT goes beyond the mere identification of actors and networks; it embodies a comprehensive methodology for exploring the intricate interplay of socio-technical systems. ANT distinguishes itself not only by recognising both human and non-human entities - from individuals and technological artefacts to organisations and standards - as actors (or actants), but also by examining their roles within heterogeneous networks of aligned interests. This approach facilitates a nuanced understanding of enrolment and translation processes, where diverse interests are aligned to form cohesive networks, and the concept of irreversibility, which describes the stabilisation of these networks over time. In addition, ANT introduces the concepts of black boxes and immutable mobiles, highlighting the persistent yet mobile nature of network elements such as software standards that transcend spatial and temporal boundaries [@lee_actor-network_1997 pp. 468-470]. These concepts are instrumental in dissecting the dynamics of IIIF and Linked Art specifications, which can be considered either full-fledged actors or immutable mobiles, depending on the context of the network under consideration. This dual perspective underscores ANT's role as both a theoretical lens and a methodological tool, providing a robust framework for dissecting the fabric of socio-technical assemblages and enriching our understanding of DH and CH interconnections. Additionally, the theoretical framework is enriched by integrating complementary perspectives from Donna Haraway’s SK, Susan Leigh Star’s BO, and Luciano Floridi’s PI. Each of these frameworks contributes uniquely to our understanding of LOUD and its socio-technical landscape. Haraway’s approach emphasises the contextually-embedded nature of knowledge, underscoring the importance of diverse perspectives in shaping our understanding of technological phenomena [see @haraway_situated_1988]. The concept of BO provides a framework for examining the role of LOUD technologies as mediators among varied groups, highlighting the importance of flexibility and adaptability in technological systems [see @star_institutional_1989; @star_this_2010]. Meanwhile, PI offers a foundational perspective, viewing information as an intrinsic part of the reality that shapes and is shaped by technologies [see @floridi_philosophy_2011]. Collectively, these theories complement the ANT-based approach by providing a multi-faceted understanding of the complexities inherent in LOUD, its technologies, and the communities involved. This ANT-grounded toolbox is composed of three elements: Demonstrating how non-human entities exert agency. Identifying the human and non-human actors involved in these processes. Investigating the concept of translation and the process by which a network can be represented by a single entity. For example, when considering the design of the PIA data model, a full-fledged actor in its own right, and more broadly for any KR system, pertinent questions arise about the influences exerted by the various groups of individuals involved in the process. These questions concern not only their interactions with each other but also their impact on the manifestation of the model and, consequently, how KRs can influence the various actors, both during its implementation and throughout its creation. In addition, the data model is composed of parts always bigger than the sum of their individual characteristics, as each part not only contributes to the overall functionality but also embodies a complex network of relationships and interactions. This viewpoint, inspired by @latour_whole_2012, asserts that in the realm of social connections, following Gabriel Tarde’s monadological approach – i.e., viewing each individual or element as a self-contained universe or ‘monad’ with its own unique properties and relations [see @tarde_social_2000] – individual elements (such as stakeholders or data points) often carry more information and potential than what is apparent when they are viewed solely as components of a larger system. In this perspective, the complexity and richness of each individual element often surpasses the aggregate. Thus, the model, potentially acting as a boundary object when not aligned with standardised processes, serves as a site of negotiation and alignment among different stakeholders. As @haraway_staying_2016 [p. 104] poetically puts it, software could also be defined as ‘imploded entities, dense material semiotic “things”’, a notion that underscores the entanglement of information, technology, and materiality. This chapter is organised into four sections, corresponding to the three aforementioned aspects and an additional section focusing on the revised epistemological foundations. First, Section 4.1 explores the dissolution of rigid distinctions between human and non-human actors, emphasising the dynamic and interdependent nature of such relationships. This exploration is fundamental in understanding the broader implications of standards and community engagements in any field. Then, Section 4.2 examines how collectives composed of differing actors can be assembled into a cohesive network where each entity’s agency and influence are recognised. In this section, the concepts of quasi-object of BO are introduced to elucidate the role of shared objects and concepts in mediating and facilitating interactions among diverse groups within a network. Section 4.3 investigates the translation process where actors negotiate, modify, influence, and align their interests and identities in the formation and maintenance of networks. Additionally, PI, particularly the SLMS approach, is introduced here to provide a structured understanding of how information and knowledge are conceptualised, managed, and communicated within these networks, offering a deeper insight into the dynamics of standard adoption and community interaction in collaborative efforts. Finally, Section 4.4 revises the epistemological foundations to address the nuanced inquiries presented later in the empirical chapters. In this section, SK is introduced, emphasising the importance of context-specific and perspective-driven knowledge in shaping our understanding of technological and cultural phenomena. This concept, developed by Donna Haraway, advocates for a more critical and reflexive view of knowledge, recognising that all knowledge is situated within specific cultural, historical, and personal contexts. This approach challenges the notion of objective or universal knowledge, asserting that all understanding is partial, located, and contingent. The incorporation of SK is crucial for comprehending how different actors’ perspectives and experiences influence the implementation and interpretation of LOUD standards. It helps in examining how these standards and community participation in IIIF and Linked Art are perceived and enacted differently across various social fabrics, particularly contrasting settings where these standards and communities are not engaged. Central to this discussion is the question, This question serves as a common thread, leading into the main research question: ‘How to situate Linked Open Usable Data and to what extent has LOUD shaped or will shape the perception of Linked Data in the broader context of cultural heritage and digital humanities’. Throughout these sections, ANT forms the underlying common thread, with the other theories augmenting and enriching this comprehensive theoretical framework. Overall, the theoretical framework will be drawn upon to explore what @manovich_cultural_2017 [pp. 60-61] refers to as ‘everything and everybody’. Borrowing from Haraway’s concept of Tentacular Thinking, this approach recognises the interconnectedness and interdependence of all elements within the research scope, from the minutiae of technical details to the broader societal implications [see @haraway_tentacular_2016]. This comprehensive view is essential for addressing the detailed nuances of technical implementations, yet it is also crucial for understanding their wider societal implications, and for considering the multi-layered complexities involved in the implementation and perception of LOUD. 4.1 Implosion of the Boundaries: Non-humans have Agency The @oxford_english_dictionary_agency_2023 assigns two primary sets of meanings to the term ‘agency’. The first pertains to ‘a person or organisation acting on behalf of another, or providing a particular service’. The second, of greater relevance to this discussion, relates to an ‘action, capacity to act or exert power’. This second definition is further elaborated as an ‘action or intervention producing a particular effect; means, instrumentality, mediation’. The concept of agency has been a central theme in various philosophical and sociological discourses. In ancient philosophy, both Plato and Aristotle contributed foundational ideas to the concept of agency, each offering distinct perspectives that have significantly influenced subsequent thought. Plato, known for his theory of ideal forms, presented a dualistic view of reality, distinguishing between the world of forms (ideas) and the physical world. Within this framework, he saw agency as the soul’s ability to recognise and conform to these ideal forms [@watson_free_1975 p. 209]. For Plato, true agency involved transcending the physical and sensory world and directing one’s actions according to reason and intellect. This pursuit of knowledge and truth was seen as the highest form of agency, with actions aimed at realising eternal and immutable truths. Plato’s vision of agency is closely tied to knowledge, virtue and the pursuit of the good, as seen in his portrayal of the philosopher-king in , who governs himself and the state with wisdom and insight [see @plato_republic_360bce]. Aristotle, Plato’s student, offered a more practical and empirical approach. He incorporated agency into his broader ethical framework, placing emphasis on the ability to act virtuously and to make decisions in accordance with a telos or purpose. Aristotle regarded every action and choice as directed towards an end, and this teleological approach is key to understanding his concept of agency. Agency in Aristotle’s philosophy is deeply intertwined with the notions of potentiality and actuality, where potentiality represents inherent capabilities and actuality is their realisation through action. This perspective reinforces the importance of rational deliberation, moral virtue, and the realisation of potentiality in human life. In addition, Aristotle emphasised the role of choice (prohairesis) and practical wisdom (phronesis) in guiding deliberate, rational and virtuous action [see @charles_aristotle_2017]. These ancient philosophical perspectives, with Plato’s focus on reason and alignment with the ideal, and Aristotle’s emphasis on practical wisdom and virtue, set the stage for later philosophical explorations of agency by modern thinkers such as David Hume (1711-1776), Immanuel Kant (1724-1804), and Georg Wilhelm Friedrich Hegel (1770-1831). In their perspectives, agency is understood as an individual’s capacity to act in the world, based on intentionality and rationality [@pippin_idealism_1991]. Hume’s empiricist approach saw agency as closely tied to the experiences and perceptions of the individual, underlining the role of personal choices and mental states [see @schier_hume_1986]. Kant’s critical philosophy stressed the importance of autonomy and moral law in agency, which he understood as the capacity of individuals to act according to universal moral principles derived from reason. Hegel offered a more dialectical approach, seeing agency as part of a broader historical and social process in which the actions of the individual are interwoven with the unfolding of rational will in history. These classical views of agency focus primarily on human agents and their conscious, intentional actions. In contrast to these traditional perspectives, the advent of ANT and the works of Bruno Latour, John Law, Madeleine Akrich, and Michel Callon, mark a significant shift. These academics argue for a more inclusive understanding of agency, where non-human entities — ranging from technological artefacts to animals and even ideas — can also be agents that influence and shape the course of social events. This perspective is a departure from the anthropocentric view of agency and opens up new ways of understanding social dynamics and networks. The concept of the , as introduced by Algirdas Julien Greimas (1917-1992), is pivotal in this context. In Greimas’ semiotic theory, an actant can be any entity, human or non-human, that contributes to the progress of a narrative. This concept significantly expands the traditional narrative framework established by Vladimir Propp (1895-1970), which focused mainly on human characters and their roles in folk tales. Propp’s analysis focused on the actions and roles of these character types, which he categorised into a standardised framework. Greimas’ approach to narratives, as referenced by @boullier_medialab_2018, emphasises the potential of any entity to play a role in a story, thereby expanding the narrative scope. The agency’s move is based on a well-known but seldom mentioned loan from Greimas’ 1966 semiotics. The concept of “actant” allowed the potential arrangement of any entity that populated the narratives to be aligned beyond Propp’s tradition. While Greimas’ formalism was certainly not preserved, the principle allowed for more open stories to be told and the concept of “allies” to be formalised, in particular, which extended the idea of “adjuvants” and “opponents” (without this being done from a strategic perspective, contrary to some interpretations). [@boullier_medialab_2018] This idea of actant resonates strongly within ANT, as it aligns with the theory’s aim to dissolve the strict dichotomy between human and non-human actors. In ANT, actants are not limited to individuals or even sentient beings; they include any entity that can affect or be affected by the network. This redefinition of agency through the lens of ANT and the concept of the actant is a cornerstone in understanding the complex, interconnected networks that constitute social and technological realms. It allows for a broader and more nuanced understanding of how various elements within a network interact and influence one another, regardless of their traditional classification as human or non-human. ANT provides a radical redefinition of agency, challenging the modernist and post-modernist interpretations. It proposes an viewpoint [@latour_postmodern_1990], dissolving the dichotomy between human and non-human agency, and focusing on the network dynamics in a society where ‘contemporary techno-science consist of intersections or “hybrids” of the human subject, language, and the external world of things, and these hybrids are as real as their constituent’ [@bolter_remediation_1999 p. 57]. Agency, in ANT, includes not just intentional actions but the capacity of any entity to affect or be affected in a relational network [@latour_actor-network_1996]. This expansive view of agency, influenced by the concept of the actant and informed by Greimas’ semiotics, offers a more holistic understanding of actors within networks. Adopting a Latourian approach, researchers such as anthropologists and sociologists are encouraged to observe the balance between human and non-human properties within networks. This balanced observation is crucial for a full understanding of the dynamics within these structures [@latour_we_1993 p. 96]. By acknowledging the agency of both human and non-human agents, and by recognising the blurred boundaries between subjects and objects, researchers can gain deeper insights into the complex interplay of forces that shape social realities. 4.2 Assembling the Collective Following the exploration of agency in the previous section, the assembly of collectives is examined. This process involves identifying a myriad of actors, both human and non-human, and understanding how they coalesce into actor-networks. The assembly of such a collective, a differing cosmos of socio-technical agents, is predicated on the recognition of each actor’s unique agency and the dynamic interplay of relationships that bind them together. The transformation challenges of tools for DH and object knowledge, as discussed by @camus_digital_2013, highlight the complexities involved in integrating DH with traditional scholarly practices. The author emphasises the need for a nuanced approach to the digitisation and dissemination of CH resources, underscoring the pivotal role of collaborative efforts in bridging the gap between technology and humanities scholarship. This analysis aligns with the ANT perspective, which advocates for recognising the contributions of diverse actors within the wider DH ecosystem. Within the LOUD ecosystem, a diverse set of actors comprises individual contributors, institutions, and several groups and committees, each with its own set of objectives. This ensemble also includes specifications and compatible software that facilitate interoperability, as well as end users. Interestingly, the majority of these end users remain unaware — whether through seamless integration or simply because their interaction does not require conscious recognition — that their digital interactions are often mediated by or compliant with LOUD standards. This diversity underscores the importance of understanding how different actors, their objectives, and their contributions shape the development and adoption of LOUD specifications and practices. Transitioning from the depiction of the LOUD ecosystem’s varied participants, the concept put forth by @gandon_web_2019 for an envisioned web architecture introduces a composed of diverse natural intelligences – such as humans, connected animals and plants – and artificial intelligences, including entities capable of reasoning and learning. This shift marks a deeper recognition of the layered interactions that form the backbone of digital platforms, and points to a future where technology adapts to embrace a wider range of intelligences within the structure of the web. Classifying these diverse actors involves understanding their roles and interactions within the network. Quasi-objects and BOs provide frameworks for this classification, enabling a nuanced understanding of the socio-technical assemblage and facilitating communication among its varied components. The foundation established by ANT, incorporating the concept of from @serres_parasite_2014 challenges traditional categorisations by embodying characteristics of both subjects and objects. This conceptual framework is crucial for appreciating how non-human entities can exhibit agency and actively participate in social networks, thus broadening our understanding of actor-network dynamics. Quasi-objects, existing in a state of flux and embodying characteristics of both subjects and objects, challenge our conventional understanding of agency. Concurrently, the concept of BOs is introduced, enriching the ANT-grounded toolbox by highlighting the role of shared objects and concepts in mediating and facilitating interactions among diverse groups within the network. Unlike quasi-objects, which symbolise a hybrid state between subjectivity and objectivity, BOs focus on interaction and communication. They are crucial in collaborative efforts, especially in diverse and interdisciplinary settings, by maintaining a common identity across various contexts while being interpreted differently in each. Understanding the distinction and the interplay between quasi-objects and BOs is vital for comprehending the dynamics of actor-networks. The introduction of BO in this section elucidates their role in mediating complex socio-technical interactions, highlighting the importance of BO in community-based initiatives and their broader impact. Star’s reflection on BOs underscores their significance: Boundary objects are objects which are both plastic enough to adapt to local needs and constraints of the several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use, and become strongly structured in individual-site use. They may be abstract or concrete. They have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable, a means of translation. The creation and management of boundary objects is key in developing and maintaining coherence across intersecting social worlds [@star_ethnography_1999 p. 393] The relevance of BO extends to the restructuring of residual categories through cycles of standardisation attempts that create said boundary objects as illustrated in Figure 4.1. This cycle emphasises the negotiation and alignment among different stakeholders, underscoring the adaptive and flexible nature of BO in managing the complexities of standardisation and the varied interpretations across social worlds [see @star_this_2010]. Figure 4.1: Relationships Between Residual Categories, BOs, and Standardisation Attempts. Adapted from @star_this_2010 The integration of concepts such as quasi-objects and BOs within the ANT-based toolbox necessitates a reevaluation of existing ontological frameworks. This shift leads to a more interconnected understanding of societal dynamics, recognising the central role of diverse actors in forming networks. The adoption of these concepts in ANT requires a re-evaluation of existing ontologies. This re-evaluation involves redefining our understanding of agency, action and influence to include a wide range of actors, both human and non-human. This approach leads to a more nuanced and interconnected understanding of society, where traditional boundaries between different types of actors are actively re-imagined. In this way, ANT argues for a network-like ontology and social theory that can fully integrate the influence and interactions of disparate actors within society [@latour_actor-network_1996 p. 370]. In engaging with the complexities of assembling the collective, we encounter the necessity to cease replicating the , as discussed by @derrida_mal_2008. This principle, which dictates the preservation and accumulation of knowledge under a single authority or location, is challenged by the fluid and distributed nature of ANT's networks. Derrida’s critique invites a rethinking of how knowledge and information are curated and disseminated, echoing the ANT perspective that stresses the distributed, multifaceted interactions of actors within a network. The dialogue with Derrida’s deconstruction of the archive complements the ANT approach by advocating a more open, inclusive understanding of how socio-technical collectives are formed and maintained. For instance, the examination of BOs in the context of queer identities illustrates the potential of these concepts to challenge and redefine traditional typologies, offering new perspectives on identity and community formation within socio-technical networks [see @junginger_categorizing_2021]. Having assembled our collective and identified the diverse actor-networks, the focus shifts to exploring the relationships and communication mechanisms among them. This exploration, conducted in the subsequent section, is crucial for understanding how different actors – ranging from quasi-objects to boundary objects – contribute to and shape the collective narrative and functionality of a given network. 4.3 The Translation Process The translation process within ANT refers to the dynamics of establishing associations and networks among diverse actors. Latour describes translation as a specialised relation that transforms mediators into coexistent entities without directly transferring causality. This concept underscores the complexity of interactions within networks, emphasising the creation and maintenance of associations that extend beyond mere causal relationships. [The] word “translation” now takes on a somewhat specialised meaning: a relation that does not transport causality but induces two mediators into coexisting [@latour_reassembling_2005 p. 119] Associations within ANT evolve continuously, demonstrating the ongoing and emergent nature of actor-networks. They materialise through the interactions and negotiations among actors. This perspective broadens the understanding of networks by stressing their fluid nature. Understanding the dynamics of actor-networks requires an exploration of the ways in which actors influence each other. This is especially important in order to capture the nuanced roles and interactions within these networks. A key contribution in this area has been made by Latour, who suggests a compelling structure for disentangling these interactions as ‘mediation’. Latour’s conceptualisation is particularly insightful in that it breaks down the nature of influence into distinct yet interrelated processes. The essence of his argument can be summarised as follows: This capacity of actors to influence each other was defined by @latour_reassembling_2005 as mediation, further broken down into four types: interference, composition, black boxing, and delegation. Interference appears when one actor interferes with the goal of another. In composition, the actors influence the common goal of the network together. Black-boxing is when gradual complexification of actors (and their interrelations) reaches a point where treating the constellation as a single actor becomes more meaningful, and delegation is when meaning and expression is delegated to non-human objects. [@czahajda_live_2022 p. 3] In the context of the LOUD space, treating a constellation of actors necessitates a focus on delegation, a process pivotal for understanding how meanings and functions are assigned within networks. This emphasis on delegation underscores the necessity of expanding our epistemological horizons to encompass PI. Integrating PI offers a comprehensive framework for analysing how information not only mediates relationships within these networks but also influences the reality of digital ecosystems. Such an expanded perspective is essential for a thorough understanding of the dynamics within the LOUD ecosystem and its broader implications. PI is conceived as a groundbreaking approach that examines the nature and dynamics of information. It explores how information fundamentally structures reality and thereby shapes the entities within it, termed or informational organisms. These entities are embedded in information environments where they engage and interact in a vast information ecosystem. @floridi_information_2010’s work illuminates the ways in which information underpins and transforms our understanding of reality, and suggests that living in the information age means recognising our role and identity as part of a complex, interconnected informational world. This perspective invites a deeper reflection on the implications of living in the midst of vast information networks, and urges a re-evaluation of how information influences human identity, society and our wider interaction with the digital and natural worlds. Diving deeper into Floridi’s PI, the LoA concept emerges as a critical tool for dissecting the intricacies of complex computational systems, including LOUD-compliant ecosystems. This approach, fundamental to PI [@ganascia_abstraction_2015], aids in navigating the multifaceted layers and perspectives inherent in these systems [@angius_philosophy_2021], particularly those structured around client-server architectures. LoA provide a structured way to analyse and understand complex systems by breaking them down into different layers or perspectives. Each level focuses on specific aspects of the system, allowing for a clearer analysis of its components and their interactions. By advocating for a separation of concerns, the LoA framework equips us with a strategic method to manage and simplify complexity, enabling a focused examination of distinct abstraction levels within digital ecosystems [@van_leeuwen_floridis_2014]. Building on the foundation laid by Floridi’s LoA, @selbst_fairness_2019 critique the fair ML field’s reliance on abstraction and modular design for achieving fairness, identifying five abstraction traps that highlight the challenges of applying computational interventions to societal contexts without considering the interplay between social context and technical systems. This critique underscores the relevance of incorporating a socio-technical perspective in this thesis, emphasising the need to make use of STS methodologies in the design process to avoid these abstraction traps. Considering the insights from @selbst_fairness_2019, this thesis will explore four levels of LoA and one transversal dimension within LOUD ecosystems, acknowledging that each can act as its own actor-network or collectively form a singular network. These levels, from low to higher abstraction, include Algorithmic and Computational Processes, Infrastructure, Data Model, and Representation and Display. Societal implications, integrated across all levels, will address the broader cultural and social impacts. Algorithmic and Computational Processes: This level explores the specific algorithms, such as image conversion scripts or sorting algorithms, and computational frameworks leveraged within the LOUD ecosystem. For example, analysing how a recommendation algorithm influences the accessibility of digital resources. Infrastructure: It focuses on the servers, cloud-based services, and micro-services that support the ecosystem. This includes assessing the deployment of server architectures that facilitate scalable data storage and retrieval, like using Amazon Web service or GitHub Pages for hosting LOUD data. Data Model: It covers the organisation and structuring of data, including metadata standards and LOUD specifications. An example would be examining how CIDOC-CRM is used in Linked Art. Representation and Display: This level not only encompasses the JSON-LD outputs for effective data sharing and interlinking but also focuses on how users see and interact with LOUD representations across different platforms – browsers, viewers, and players. It examines the GUI that make digital collections accessible and engaging. For example, how a virtual gallery allows users to explore a digital exhibit. In the exploration of LOUD ecosystems, the concepts of and , as delineated by @bolter_remediation_1999, provide additional insightful perspectives on the role of interfaces as LoA. Immediacy refers to the design of interfaces that aim to create a seamless, transparent UX, making the technology invisible and allowing for direct interaction with the content. This is evident in interfaces such as OSD, which displays high-resolution images and strives to provide a smooth and immersive viewing experience by minimising the perceptibility of the API-compliant resource. On the other hand, hypermediacy emphasises the presence and visibility of the medium, drawing attention to the various forms of mediation. Interfaces embodying hypermediacy offer a multi-layered, heterogeneous presentation, making users aware of the different media elements and their interactions, e.g. Exhibit for storytelling purposes. This duality enriches the user’s digital encounter, underscoring the need for LOUD-compliant tools and services to mediate these experiences effectively. By embracing these concepts, the framework can strategically leverage interfaces to either conceal or reveal the intricacies of the digital medium, facilitating a nuanced engagement with information ecosystems. Figure 4.2 presents a comprehensive view of the LOUD ecosystem’s LoA, enriched by the inclusion of societal implications as a cross-cutting dimension and the incorporation of immediacy and hypermediacy as critical concepts at the representation and display level for understanding user interaction and interface design. Figure 4.2: Exploring Levels of Abstraction in LOUD Ecosystems: Integrating Societal Implications with the Concepts of Immediacy and Hypermediacy The SLMS scheme, which includes a framework for each identified LoA, equips this research with a comprehensive lens for analysing computational systems, revealing how they can be effectively combined with ANT. This methodology, as depicted in Figure 4.3, enriches the exploration of digital ecosystems. This combination offers a unique perspective on understanding the interplay between computational entities and the broader networks they inhabit. The SLMS scheme can be summarised as follows: System: This refers to any entity or collection of entities that can be studied. It could be anything from a physical object, a biological entity, to a social or computational system. LoA: Floridi emphasises the importance of levels of abstraction in understanding and analysing systems. A level of abstraction is a way of observing a system, focusing on certain aspects while ignoring others. It’s a conceptual framework or lens through which we can understand complex systems. Model: At each level of abstraction, we create models of the system. These models are simplifications or representations of the system that highlight certain features while omitting others, based on the chosen level of abstraction. Structure: This refers to the organisation or arrangement of the components within the system, as understood or represented at a given level of abstraction. Figure 4.3: The SLMS According to [@floridi_method_2008] @gobbo_what_2016 expand on the SLMS scheme by highlighting the challenges and intricacies of quantifying and qualifying computational information. They advocate for a comprehensive methodology that appreciates both the physical and conceptual dimensions of data, facilitating a deeper understanding of programmable artefacts and their informational content. This perspective not only complements the analytical capabilities of ANT but also opens new avenues for investigating the dynamics of information and technology. As I venture to revise the epistemological foundations and introduce Haraway’s concept of SK, it becomes increasingly manifest that the integration of ANT with Floridi’s PI and the insights of computational information theory provides a robust framework for exploring the complexities of digital and networked environments. This interdisciplinary approach lays the groundwork for a comprehensive exploration of the digital world, emphasising the importance of situated, contextual knowledge in understanding and navigating the digital landscape. 4.4 Epistemological Foundations This section establishes the epistemological foundations, presenting ANT, BO, and PI, alongside Donna Haraway’s SK. Rather than synthesising these theories, this chapter places an emphasis on situating LOUD within a feminist perspective to construct a new materialistic foundation reminiscent of Haraway’s [@haraway_staying_2016 p. 42]. This assemblage seeks to navigate the controversies and mappings within LOUD-like communities, applying a Tardian approach to trace the spreadability of ideas @latour_whole_2012. To analyse the relevant actor-networks effectively, especially being part of both the IIIF and Linked Art communities, a particular lens is required. ANT, while expansive, has faced criticism for its perceived flatness in analysing networks. Here, Haraway’s SK becomes instrumental, providing a stance that enriches the ANT-grounded theoretical framework with a comprehensive lens that prioritises context in shaping knowledge. SK emphasises that knowledge is always situated, partial, and contextually produced, offering a critical perspective on determining relevance within networks. SK complements ANT by adding depth to the analysis of actor-networks. It highlights the significance of context – both human and non-human – in the production of knowledge, thereby enriching the theoretical framework with a nuanced lens for exploring the dynamics within the LOUD space. SK, as articulated by @haraway_situated_1988, emphasises the contextual nature of knowledge and challenges the pursuit of an objective, universal truth divorced from the position of the knower. Haraway’s framework, which integrates standpoint theory, concedes that knowledge is inherently shaped by its social, cultural and historical context, and supports an understanding of knowledge as partial and situated. This approach, noting the influence of epistemological privilege and intersectionality, argues against universalism by stressing the importance of being conscious of the specific perspectives and biases that inform one’s understanding. The relevance of SK to ANT lies in its complementary perspective of acknowledging the diverse, context-specific factors that influence knowledge production within networks, enriching ANT's analysis of actor-networks by incorporating a critical, reflexive lens on the situatedness of knowledge. In forging this theoretical framework, I seek to transcend the notion of merely disparate ideas. Instead, I aim to weave their contributions into a coherent web of thought, ensuring a seamless and comprehensive framework that embodies the essence of their respective insights, one that transcends disciplinary boundaries. The theoretical framework can be synthesised as follows, elegantly interweaving distinct yet complementary perspectives to enrich our understanding of socio-technical ecosystems: SK, by advocating for an understanding of knowledge as inherently partial and situated, complements ANT by adding depth to the analysis of actor-networks. It enriches the theoretical framework with a nuanced lens for exploring the dynamics within the LOUD space. BO further refine this framework by offering a means to characterise actors and mediate interactions within the network, facilitating connectivity and translation among diverse groups. PI enriches ANT by framing information as a fundamental element in actor-network interactions. It offers insights into the different LoA, or different actor-networks, each with a different perspective or granularity, and how information is created, shared, and used, and how these processes influence the relationships and dynamics within networks. Figure 4.4 illustrates how these theories intertwine to form the epistemological foundation of this research, demonstrating the synergistic potential of combining ANT with SK, BO, and PI to navigate the complexities of digital ecosystems in the CH field. Figure 4.4: A Sympoiesis of Theories: ANT Entangled with SK, BO, and PI Concluding this chapter, the developed toolbox lays a coherent foundation for empirical research, poised to explore the dynamics within actor-networks. This exploration is not entirely novel in the context of CH, as exemplified by @guillem_faire_2023’s use of ANT to spotlight the keystones that were destroyed by the fire at Notre-Dame de Paris. As this narrative unfolds into , the ANT-based toolbox, enriched with amodern and feminist perspectives, will be instrumental in navigating the forthcoming empirical landscapes. 5. Research Scope and Methodology This chapter delineates the Research Scope and Methodology, laying the groundwork for the empirical exploration within this thesis. (…) 6. The Social Fabrics of IIIF and Linked Art (…) 7. PIA as a Laboratory (…) 8. Yale’s LUX and LOUD Consistency (…) 9. Discussion [Il] faut renoncer à l’idée d’une interopérabilité syntaxique ou structurelle par l’utilisation d’un modèle unique, qu’il s’agisse de la production, de stockage ou de l’exploitation au sein même d’un [système d’information]. [@poupeau_reflexions_2018] [114] This chapter presents a comprehensive discussion where I interpret, analyse and critically examine my findings in relation to the thesis and the wider application of LOUD. Through an in-depth analysis of the design principles of LOUD and their implications for CH, this discussion aims to demonstrate the many challenges and opportunities inherent in this framework. The focus is on achieving community-driven consensus, rather than simply pursuing technological breakthrough. The following sections are organised to provide a comprehensive review of the empirical findings, an evaluation abstracting LOUD, and a retrospective analysis of the research journey. Firstly, in Section 9.1, I will present a summary of the empirical findings from my research. This will include key themes and insights, structured to reflect the different areas of study and practice within LOUD. Secondly, in Section 9.2 I will provide an evaluation of LOUD by means of using the LoA approach. This evaluation will focus on the impact of LOUD on the perception of Linked Data within the CH domain and the wider DH field. This will include the key themes and insights that have emerged, structured in a way that reflects four levels of abstraction. I will also explore the dual nature of LOUD implementation, involving both simplicity and complexity, and discuss the various factors that influence such dynamics. Finally, in Section 9.3, I will offer a retrospective analysis of the research journey. This section will interpret the findings to situate LOUD as fully-fledged actors. It will reflect on the challenges, achievements, and lessons learned throughout the research process, providing a holistic view of the project’s trajectory and its implications for the future of LOUD. 9.1 Empirical Findings This section summarises the empirical findings of my research and already offers some suggestions. The structure does not follow the exact order of the three empirical chapters but is organised around overarching topics that emerged throughout the study. The seven topics include Community Practices and Standards, Inclusion and Marginalised Groups, Maintenance and Community Engagement, Interoperability and Usability, Future Directions and Sustainability, Digital Materiality and Representation, as well as Challenges of Scaling and Implementation. Community Practices and Standards GitHub serves as a vital hub for community involvement, with a core group of active contributors often attending meetings regularly. This platform simplifies decision-making within the community, although it also reflects biases similar to those in FLOSS communities. Behind visible activities like meetings, there is substantial preparatory work managed by co-chairs, editorial boards, or driven by community-generated use cases. This foundational work often determines the direction and outcomes of formal gatherings. The LUX project at Yale, as seen in , has successfully fostered collaboration across various units, bringing together libraries and museums on a unified platform. The technological foundation of LUX, based on open standards, facilitates data integration and cross-collections discovery. Not only does the deployment of FLOSS tools contribute to these achievements, but it also emphasises the social advantages of working collaboratively. The concept of the Tragedy of the Commons, as described by @hardin_tragedy_1968, highlights the potential for individual self-interest to deplete shared resources. However, @ostrom_governing_1990 offers a counterpoint by demonstrating how communities can successfully manage common resources through collective action and shared norms. In this context, initiatives like the CHAOSS initiative[115] play a significant role by providing metrics that help evaluate the health and sustainability of open source communities. These metrics include contributions, issue resolution times, and community growth, offering valuable insights into how collaborative efforts can be maintained and improved. Reaching consensus is another critical aspect of community practices and standards. While the minutes of meetings are valuable artefacts, they often reflect an Anglo-Saxon approach to decision-making characterised by few substantive points and critical turning points. The formal aspects of conversations captured in minutes do not fully encompass the decision-making process, which frequently involves informal conversations, consensus-building through open dialogue, and subtle cues that influence outcomes. These elements are integral to the English and American approach and hold valuable lessons for an international community. IIIF and Linked Art are international communities, but decisions are made in English and the majority of participants are based in North America and the UK, significantly imprinting this approach. Understanding these nuances can help us improve our collaborative efforts within the IIIF and Linked Art communities. By recognising and appreciating these different facets of decision-making, we can learn from each other and enhance our collective ability to make effective and inclusive decisions. Some of the challenges associated with these practices include the major demand on resources for community building, the slowness inherent in distributed development, and the difficulty in achieving consensus. Additionally, the concept of social sustainability can be seen as an imaginary construct that papers over differences, as discussed by @fitzpatrick_generous_2019. Addressing these challenges is crucial for the long-term success and effectiveness of the IIIF and Linked Art communities. Inclusion and Marginalised Groups The demographic homogeneity in these communities can perpetuate biases and neglect issues relevant to underrepresented or marginalised groups, as seen in . Participation in these standardisation processes is itself a privilege. The assumption that internet access and digital devices are universally available is critically examined, revealing key actors in the digital landscape. This mirrors issues within the IIIF community, where generating IIIF resources presupposes means that may not be accessible to all. We need clear terms of inclusion, as highlighted by @hoffmann_terms_2021. She argues that effective inclusion requires a critical examination of the frameworks and conditions under which inclusion is offered. The framework should ensure that inclusion initiatives do not merely add diversity to existing power structures but work to transform these structures fundamentally. This involves questioning who defines the terms of inclusion, who benefits from them, and who may be inadvertently excluded. @hoffmann_terms_2021 suggests a participatory approach, where marginalised communities are actively involved in shaping inclusion policies and practices, thus making inclusion an ongoing, reflective process rather than a static goal. The inclusion of marginalised groups is a necessary step, but it is not sufficient. To truly make a difference, there must be a strategic and concentrated effort to appropriate technologies, as emphasised by [@morales_apropiacion_2009; @morales_imaginacion_2017; @morales_apropiacion_2018] and further articulated by [@martinez_demarco_empowering_2019; @martinez_demarco_digital_2023]. This strategic approach highlights the political significance of challenging dominant neoliberal and consumerist perspectives on technology and individual engagement. @martinez_demarco_digital_2023 underscores the critical importance of focusing on practices that go beyond mere inclusion. Instead, it requires a deep understanding and critical assessment of how technology is intertwined with social, economic, and ideological contexts. It implies a reflective and deliberate process of technology adoption in which individuals creatively tailor technology to their specific needs, beliefs, and interests. Moreover, a key aspect highlighted by @martinez_demarco_digital_2023 is the implicit and explicit critique of a universalist approach to inclusion, which often lends itself to all too easy instrumentalisation. Understanding and studying resistance to inclusion in an oppressive digital transformation context is paramount, particularly given the highly unequal conditions that prevail. In this light, a comprehensive study of socio-material and symbolic processes, practices, and involved in embedding technologies into individuals’ lives is needed. This approach also recognises technology as a catalyst for change. It envisions the use of technology to drive meaningful change at multiple dimensions and realities—national, societal, or personal. By focusing on these practices, empowering individuals to navigate and use technology thoughtfully and purposefully becomes a reality, bridging the gap between technological advances and societal progress [@martinez_demarco_empowering_2019]. Maintenance and Community Engagement The tension between creating advanced specifications and their practical implementation by platforms is evident in the IIIF Cookbook recipes and Linked Art patterns, as discussed in Chapter 6. This ongoing development shows that the community is still finding the best ways to achieve broad adoption and interoperability. The deployment of the Change Discovery API, as illustrated in Chapter 7, demonstrates that establishing such a protocol on top of the IIIF Presentation API is feasible and straightforward. High-level support from leadership, particularly Susan Gibbons as Vice Provost, has been crucial in building trust and ensuring the project’s success as a valuable discovery layer at Yale. This integration of diverse collections through a unified platform, based on open standards, highlights the potential for transforming teaching, learning, and research by leveraging collaborative efforts. The topic modelling exercise in LUX reveals the intricate actor-networks composed of organisations, individuals, and non-human actors. This analysis underscores the importance of ongoing processes and relationships in maintaining and evolving infrastructure, akin to the concept of ‘infrastructuring’. As detailed in Chapter 8, following best practices and guidelines such as the SHARED Principles is essential for better involvement, but it is also crucial to uphold these commitments consistently over the long term to ensure meaningful participation. Between the PIA team members, there were sometimes ‘disconnects between different communities who undertake collaborative research’ [@vienni-baptista_foundations_2023]. This was something we had to navigate and learn from, which was manageable within the context of a laboratory setting. However, for any follow-up projects or whatever forms the digital infrastructure we built may take, it is imperative that these disconnects are addressed and solidified to ensure cohesive and sustained community engagement. Interoperability and Usability Within PIA, different APIs have been progressively deployed to meet various requirements while allowing parallel exploration of data modelling. Each API offers unique advantages, but their collective integration promotes semantic interoperability. For example, the IIIF Image API has been instrumental in rationalising image distribution across prototypes, providing efficient access to high-quality digital surrogates and the ability to resize them for different uses. Adherence to LOUD standards and schemas within LUX has generally been positive, although transitioning between versions of a specification can present challenges, highlighting the need to improve the consistency of compliant resources. Linked Art, for instance, has the capacity to generate various insights and sources of truth around different entities. However, additional or entirely new vocabularies from sources like the Getty may need to be used – such as Homosaurus. Complementary to Linked Art, using WADM allows for assertions that go beyond purely descriptive narratives, though it may sacrifice some semantic richness. This complexity in managing vocabularies and maintaining semantic richness directly ties into broader usability considerations within the community. Addressing these usability concerns, Robert Sanderson has suggested focusing on the use of full URIs in Linked Art to ensure computational usability, in contrast to IIIF‘s approach of minimising URIs to enhance readability. This difference highlights a fundamental question in usability: balancing readability and computational usability. Understanding developers’ perspectives on these approaches is critical. I would suggest as a way forward for the IIIF and Linked Art communities to focus on further improving usability of the specifications. This includes conducting comprehensive usability assessments of APIs to evaluate the experiences of new developers versus existing ones, understanding the steepness of the learning curve associated with each API, and guiding improvements in documentation, on-boarding processes, and overall developer support. Efforts should be made to lower the barriers to entry for new developers by developing more intuitive and user-friendly tutorials, providing example projects, and creating a robust support community. Ensuring that developers can quickly and effectively leverage APIs will foster greater adoption. Addressing the challenges of transitioning between different versions of specifications is critical, and developing tools and guidelines that help maintain consistency across versions will reduce friction and ensure smoother updates. Future Directions and Sustainability Survey findings, as discussed in , underscore the need for ongoing efforts to develop LOUD standards that foster an inclusive, dynamic digital ecosystem. Future strategies should include creating educational resources and frameworks that support interdisciplinary collaboration and reduce barriers to participation. While the Manifest serves as the fundamental unit within IIIF, the Linked Art protocol can play a similar central role as semantic gateways in broader contexts, allowing round-tripping across the APIs. The topic modelling exercise in LUX, detailed in , reveals complex actor-networks of organisations, individuals, and non-human actors, providing insights into the relationships sustaining the LUX initiative. The next steps for Linked Art might involve forming a new consortium independent of a CIDOC Working Group, which could provide the necessary support to sustain the initiative. Alternatively, integrating Linked Art into IIIF as a new TSG and specification could address the discovery challenges within IIIF, as discussed during the birds of a feather session led by Robert Sanderson [see @raemy_notes_2024] at the 2024 IIIF Conference in Los Angeles[116]. Design principles that act as bridges across different disciplines, as proposed by @roke_pragmatic_2022, are crucial. IIIF has demonstrated that this collaborative approach is feasible, and Linked Art could follow in its footsteps. However, achieving this requires increased dedication from passive members and broader adoption of the model and the API ecosystem in the near future. Digital Materiality and Representation As explored in Chapter 7, the detailed digital representation of photographic albums, such as the Kreis Family Collection, demonstrates the need to comprehensively capture the materiality of digital objects. This includes the structure and context of images, which are crucial for maintaining their historical and social significance. The implementation of the IIIF Presentation API in creating a detailed digital replica of the Getty’s Bayard Album shows how digital materiality can be enhanced through thoughtful use of technology, but also highlights the scalability challenges for such detailed representations. Creating these detailed digital representations can be seen as a ‘boutique’ approach, which, while labour-intensive and resource-demanding, is necessary for preserving the integrity and contextual significance of cultural heritage objects. The challenge lies in developing the appropriate means and methodologies to achieve this level of detail consistently. Future endeavours, whether through research projects or collaborative efforts between GLAM institutions and DH practitioners, should aim to address these challenges and create sustainable practices for digital materiality and representation. As Edwards aptly notes: ‘Presentational forms equally reflect specific intent in the use and value of the photographs they embed, to the extent that the objects that embed photographs are in many cases meaningless without their photographs; for instance, empty frames or albums. These objects are only invigorated when they are again in conjunction with the images with which they have a symbiotic relationship, for display functions not only make the thing itself visible but make it more visible in certain ways‘. [@edwards_photographs_2004 p. 11] Challenges of Scaling and Implementation As seen in Chapter 6, the IIIF Cookbook recipes and Linked Art patterns reflect the tension between creating advanced specifications and their practical implementation. This gap between ideation and real-world application underscores the challenges faced by the community in achieving broad adoption and interoperability. In Chapter 7, the exploration of APIs like the IIIF Change Discovery API illustrates the practical challenges and potential of scaling these technologies for wider adoption. The successful implementation in PIA demonstrates viability, but also points to the need for continued development and community engagement to fully realise the benefits. Furthermore, assessing the scalability of IIIF image servers, as discussed by [@duin_webassembly_2022] and exemplified by the firm Q42 with their Edge-based service Micrio[117], highlights the importance of optimising data performance. Erwin Verbruggen aptly noted that ‘optimising data performance in my opinion mens sending as little data over as needed’[118], emphasising the need for efficient data handling to enhance scalability. This insight reinforces the necessity of continual refinement in scaling digital infrastructure to support broader use and integration. Reflecting on these findings, I would like to assert that continuous participation, particularly for institutions that can afford to be part of initiatives like IIIF-C, is essential. Active members should not only focus on their own use cases but also consider the needs and perspectives of other, perhaps marginalised, groups. Achieving the dual goals of making progress within one community, whether it be IIIF or Linked Art, while also engaging in effective outreach and creating a solid baseline, will benefit everyone in the CH sector and beyond. Addressing where LOUD fits in, how people perceive this new concept or paradigm, and understanding how LOUD differs from Linked Data in general are essential. These questions help to clarify the stages at which themes related to one of the LOUD design principles emerge, crystallise, and potentially disappear. My thesis does not fully resolve these queries but offers insights and hints for further exploration. In conclusion, the empirical findings reveal the richness of the implementation and maintenance of LOUD standards in the CH domain. From the critical role of community practices and standards to the challenges of achieving interoperability and inclusivity, each theme underlines the complex interplay of social, technical and organisational factors. will look at the evaluation of LOUD and explore its overall impact, delving into the delta of what to do with it, particularly in terms of Linked Data versus LOUD, where my thesis provides pointers but does not provide definitive answers. 9.2 Evaluation: Abstracting LOUD In this section I will assess the impact of LOUD within the CH domain and the wider DH field, examining its implications for community practices and semantic interoperability, and secondarily whether LOUD has affected the perception of Linked Data. Referring to Figure 4.2, the following is a descriptive attempt to provide levels of abstraction of LOUD based on my empirical findings, focusing particularly on the deployment of IIIF within PIA and Linked Art within the LUX framework, aside from the data model abstraction level. Representation and Display: For PIA, the implementation of Leaflet provided an immediate and easy-to-integrate viewer to display high-resolution digitised images of CAS photographs. The context is accessible through accompanying metadata and related links on the GUI. Although not LOUD-driven per se, it functions as a mediator through the IIIF Image API. Balancing between immediacy and hypermediacy, the Mirador instance enabled the display of IIIF Presentation API resources with machine-generated annotations. We also incorporated the V3 plug-in to manipulate images[119]. However, we failed to provide a robust authentication layer allowing users to add their own annotations easily, highlighting the limitations of a four-year research project not primarily aimed at tool development but at proposing a participatory system. IIIF-compliant software can aid in this, yet development needs to be community-driven rather than individualised. Exhibit was the only tool used for educational and teaching purposes that was well-received, though integration issues persisted. LUX exemplifies Linked Art hypermediacy, where the structure of the JSON-LD representation drives their GUI, including URL syntax[120]. For both PIA and LUX, JSON-LD serves as an interface for certain users (software developers, data curators, data scientists). While resources can become BOs depending on the viewer, a few inconsistencies can still be overcome and will likely be understood by humans reading the files. Data Model: The data model of IIIF is primarily driven by its design principles and WADM. Also the main unit is the Manifest, often a digitisation or representation of a physical object, meaning the Presentation API is key to achieve an acceptable level of interoperability. The Shared Canvas Data Model is still somewhere, baked into the specifications, but one does not really need to know about this model to understand how IIIF works from Version 2 of the main APIs onwards, it is a piece of history though. IIIF is Linked Data, but has no real semantics value and should really not be treated as RDF triples. One could almost say the same about Linked Art, as it is not necessary to fully understand CIDOC-CRM to either start using its model or to deploy an API endpoint. However, some basics do need to be understood, such as the event-based model viewpoint, the important classes and their rdfs:domain and rdfs:range. However, Linked Art has bent some rules and created some properties and classes to meet the needs of the community. As far as implementation goes, I would suggest to directly implement and be consistent with the Linked Art API endpoints, rather than starting with the data model, as I see cross-institutional interoperability through interfaces as a more important milestone than data modelling as a pastime for the few specialists. For both PIA and LUX, dedicated data models were done to be consistent with the specifications, with some internal structure and data in LUX for their own purposes. Semantic data in PIA, as already realised through the Omeka A JSON-LD API was not done beyond templating of a few Linked Art resources and the workflow done with the University of Oxford, but there is not a productive PIA Linked Art API at the moment. Infrastructure: Serialisation mock-ups and JSON-LD templates on GitHub were the starting point to model IIIF Manifests and Collections for the PIA research project. Laravel and then Omeka A were the two main elements, in two different iterations, that were leveraged to present the IIIF resources. If single image Manifests were quite easy to serialised, the integration of more detailed representations of photographic albums presented challenges. A more robust infrastructure is definitely needed for the long-run, but efficient enough in a laboratory setting. Algorithmic and Computational Processes: PIA relied on virtual machines and had the necessary Kakadu licence embedded in our SIPI instance to encode the images. If the former proved difficult as performance was sometimes an issue, the latter was a good option as serving JPEG2000 images cannot currently rely on FLOSS solutions which are too slow. The LUX pipeline and the use of MarkLogic as a multi-modal database are examples of the data engineering expertise and outsourcing solutions required for such a platform. Some open source solutions, such as QLever[121], a high-performance SPARQL engine, may also offer some hope to institutions that are not well-funded and need robust knowledge graph-oriented solutions. The dual simplicity and complexity of implementing LOUD specifications and participating in community-led efforts can be attributed to the need for a reorientation of research projects. It is essential for these projects to actively engage in community processes rather than intermittently presenting their progress and subsequently withdrawing. This ongoing engagement fosters a more robust and collaborative environment, ultimately contributing to the advancement of shared goals and standards. Such a reorientation necessitates a fundamental change in how universities and GLAMs institutions operate, extending their involvement beyond the immediate project scope to ensure sustained participation and impact. Despite the introduction of LOUD, the perception of Linked Data has not evolved significantly. Most software engineers continue to treat resources primarily as JSON, often overlooking the graph structure that underpins Linked Data. For IIIF, this approach is appropriate given its focus on content interoperability and presentation. However, for Linked Art, overlooking the graph structure could be problematic to some extent, as it limits the full realisation of the semantic relationships and rich interconnections that Linked Data can provide. This highlights the need for more focused efforts to integrate semantic web principles, particularly in contexts where these principles can significantly improve the quality of data. I have faced challenges in moving many of the models developed within PIA into (beta) production, and the usability requirements of APIs have scarcely been addressed. However, the findings from this thesis should be viewed as starting points rather than conclusive solutions. The unseen aspect of this dissertation is my active involvement in both communities and my attempts to reciprocate this engagement within PIA. Each investigation presented could have warranted a dedicated thesis, indicating the breadth and depth of the topics explored. Ultimately, this work merely scratches the surface of numerous subjects, laying the groundwork for future research and development. The next section will offer a retrospective on the work accomplished during this PhD thesis. It will reflect on the various milestones achieved, the lessons learned, and the potential directions for future research. 9.3 Retrospective: Truding like an Ant In this retrospective[122], I will offer an analysis of the research journey. This section will interpret the findings to situate LOUD as fully-fledged actors within the CH field. It will reflect on the challenges, achievements, and lessons learned throughout the research process, providing a holistic view of the project’s trajectory and its implications for the future of LOUD. The empirical findings of my research reveal the nuanced interplay between socio-technical practices and implementations, synthesising insights through both thematic and abstract lenses. This dual approach underscores the importance of fostering collaboration and effective decision-making, while addressing biases and promoting inclusivity. The need for ongoing maintenance, interoperability and usability remains paramount, as does the development of educational resources and consortia to sustain initiatives. In addition, capturing digital materiality and addressing scalability challenges are critical to the widespread integration of LOUD standards. These findings lay the groundwork for future research and development aimed at bridging operational applications with more extensive design approaches. How can LOUD be situated as fully-fledged actors within the CH field? Reflecting on the notion of , frequently mentioned during the 2024 IIIF Conference, LOUD specifications embody this concept perfectly. Even if not all embedded patterns of a given API-compliant resource are correctly interpreted or rendered by a client, some of its basic features should still be displayed. This flexibility is crucial for ensuring the broad usability and adaptability of LOUD standards, allowing them to transcend institutional boundaries and serve as robust mediums of knowledge transfer. To paraphrase @poupeau_reflexions_2018’s quote at the beginning of this chapter, there isn’t a unique model for interoperability, but there are definitely best sociotechnical practices to be learned from IIIF and Linked Art. The act of participation prevails over the relatively easy and one-off deployment of specifications for the short term. By using LOUD, CH data can be effectively interlinked with different datasets, resulting in numerous potential benefits. An overriding benefit is the improved discoverability and accessibility of CH resources, facilitating enhanced search and retrieval capabilities. In addition, the adoption of LOUD promotes seamless data sharing and reuse within academic and memory institutions, fostering a culture of collaboration and interdisciplinary knowledge exchange. This approach not only enhances the overall utility and comprehensiveness of CH repositories, but also promotes collective understanding and appreciation of diverse cultural assets and historical narratives. However, it is essential to critically evaluate the application of LOUD in the context of CH data. While LOUD offers promising prospects for improved data interlinking and accessibility, challenges and concerns persist. The transition to LOUD principles necessitates significant investments in resources, including infrastructure, expertise, and time, which may pose barriers for smaller institutions or those with limited funding. Moreover, ensuring the accuracy, consistency, and quality of Linked Data is a complex task, demanding meticulous attention to detail and ongoing maintenance efforts. Furthermore, potential issues related to data ownership, rights management, and the potential misuse or misinterpretation of interconnected data should be carefully considered. Standardisation across different CH domains, each with unique data structures, formats, and contexts, may present formidable obstacles to seamless integration. These concerns underscore the need for a nuanced and cautious approach to the implementation of LOUD standards, taking into account the complexity and specificity of CH data and its diverse custodians. This thesis has been a journey of discovering Linked Art and a confirmation that the ethos of IIIF is yet to be fully manifested beyond product implementation. The sense of belonging to a community is an ongoing endeavour, much like the ants in Latour’s metaphor. This dissertation underscores that active participation in community processes is essential to achieving the dual goals of advancing the technological framework for semantic interoperability and fostering an inclusive and collaborative CH ecosystem. 10. Conclusion For a better understanding of the past, Our images have to be enhanced, A new dialogue in three dimensions, Must have openness at its heart, For somewhere within the archive Of our aggregated minds Are a multitude of questions And a multitude of answers, Simply awaiting to be found. [@mr_gee_day_2023] This chapter brings to a close the journey undertaken since February 2021, aiming to clearly articulate the answers to the research questions, discuss how the research aligns with the objectives, elucidate the significance of the work, outline its shortcomings, and suggest avenues for future research. I had the privilege of hearing the above poem at EuropeanaTech in The Hague in October 2023. What struck me most, and what I have tried to convey in this thesis, was the powerful dialogue and collective spirit striving to harness the potential of our (digital) heritage. With a sense of conviction after this conference, I approached the next one in Geneva in February 2024 with confidence, believing that I had made a compelling case for the concept of LOUD. When a participant asked how LOUD differed from Linked Data, however, I found myself explaining the socio-technical ethos of IIIF and Linked Art, the richness of the individuals who make them up, the ability to combine these different standards, and the common use cases that emerge from these collaborations. Whether my answer was convincing remains uncertain, but I knew it was too brief. Perhaps it is here, in this conclusion, that my thoughts can find their full expression. I believe that LOUD should be at the forefront of efforts to improve the accessibility and usability of CH data, an endeavour that is increasingly relevant in a web-centric environment. This paradigm has gained considerable traction, particularly with the advent of Linked Art and the recognition that the IIIF Presentation API has been an inspiration for the LOUD design principles. The development and maintenance of LOUD standards by dedicated communities are characterised by collaboration, consensus building, and transparency. In the interstices of the IIIF and Linked Art communities, frameworks for interoperability are not only exposed, but revealed as profound testaments to the power of transparent collaboration across institutional boundaries. Both communities, it is true, are still very much Anglo-Saxon efforts, where the specifications have mainly been implemented in GLAM and/or DH research projects, or at least when we have been aware of them. It has clear guidelines on how to propose use cases, mostly using GitHub, and hides the sometimes unnecessary RDF complexity behind a set of JSON-LD @ context. IIIF is at the presentation layer and can really play its role as a mediator, with the Manifest as its central unit connecting other specifications, including semantic metadata, and preferably with simpatico specifications such as Linked Art. An important hypothesis arises from the observation that adherence to the LOUD design principles makes specifications more likely to be adopted. The primary benefit of adopting LOUD standards lies in their grassroots nature. This grassroots approach not only aligns with the core values of openness and collaboration within the DH community but also serves as a common denominator between DH practitioners and CHIs. This unique alignment fosters a sense of shared purpose and common ground. However, it’s essential to acknowledge that while LOUD and its associated standards, including IIIF, hold immense promise, their limited recognition in the wider socio-technical ecosystem may currently hinder their full potential impact beyond the CH domain. Consideration of socio-technical requirements and the promotion of digital equity are essential to the development of specifications in line with the LOUD design principles. In the context of the IIIF and Linked Art communities, this means both recognising current challenges and building on existing practices. This includes forming alliances that support diverse forms of inclusion at both project and individual levels. For example, organisations should be encouraged to send representatives from diverse professional and personal backgrounds, such as underrepresented groups or non-technical fields. This can be facilitated by initiatives that lower the barriers to participation, such as financial support for travel and participation, flexible participation formats, and targeted outreach efforts. Furthermore, as these standards often align with open government data initiatives, they present opportunities for broader public engagement and institutional transparency. In the broader context of DH, understanding LOUD involves tracing the historical development of the field and its evolving relationship with technology. The interdisciplinary nature of DH has always integrated diverse scholarly and technical practices. In recent years, DH has seen a notable increase in interest in the use of Linked Data and semantic technologies to improve the discoverability and accessibility of CH collections. LOUD's emphasis on user-centred design and usability aligns well with these goals. Consequently, the principles of LOUD hold great promise for advancing the integration and use of community-driven APIs and/or Linked Data within DH. This can be seen within PIA, where the benefits of implementing IIIF helped us to streamline machine-generated annotations, integrate different thumbnails into GUI prototypes, model photo albums with different layers from the Kreis Family collection, and enable project members and students to engage in digital storytelling, an important participatory facet that can be seamlessly explored by DH efforts and CHIs with the help of the IIIF Image and Presentation APIs. Data reuse is definitely a key LOUD driver, which could have been done more extensively with a productive instance of Linked Art. As for widening participation, this is definitely a strategic and political decision, rather than a technical one. That said, LOUD specifications can definitely be embedded through strategic citizen science initiatives. A recent example that highlights the comprehensive value of Linked Data was presented by @newbury_linked_2024 at the CNI Spring 2024 Meeting. He delineated its significance as extending well beyond single entities, such as the Getty Research Institute, to enrich a vast ecosystem. Specifically, he identified three principal areas of value: Firstly, within the ecosystem itself, where the utility of information is amplified through its application in diverse contexts. Secondly, for the audience, by directly addressing user needs and facilitating various conceptual frameworks. And finally, within the community, by enabling wider use and adaptation of data and code. This approach to Linked Data, as articulated by Newbury, not only enhances its utility across these dimensions, but also aligns seamlessly with the LOUD proposition, underscoring a shared vision for a digital space where the interconnectedness and accessibility of (meta)data serve as foundational principles for progress and community engagement. LUX, as a catalyst for LOUD, exemplifies a practical approach to implementing Linked Data that has garnered significant local engagement and support at Yale. This initiative demonstrates how sound socio-technical practices can be effectively applied within a supportive institutional environment. The consistency of the data within LUX aligns well with IIIF and Linked Art standards, with only a few minor adjustments required for full compliance. These quick fixes are manageable and do not detract from the overall robustness of the initiative. While it may be too early to fully assess the wider impact of using LOUD specifications on the LUX platform within the CH domain, the initiative has already attracted considerable interest in recent months. This growing attention suggests that the LUX approach is resonating with other organisations, suggesting the potential for wider adoption and impact. The enthusiastic local engagement at Yale provides a strong foundation for LUX and highlights its potential to serve as a model for similar projects aimed at enriching digital heritage through effective collaboration and agreed-upon standards. In carrying out this thesis, I have adhered to the five main objectives set out at the beginning of the PhD. These objectives have been accomplished to a high degree, reflecting a substantial and well-executed project. Furthermore, most of the outputs – such as data models and scripts – from this work are available on GitHub, providing open access to the wider community. In addition, I have published several papers, both individually and collaboratively, further disseminating the findings and contributions of this research. Additionally, this thesis is relevant because it sheds light on communities and implementations that can be celebrated not only for their standards but also for their operating ethos; IIIF and Linked Art present models ripe for emulation beyond their immediate digital confines. Here, agency and authority are most typically granted to the collective over the isolated, with each actor - be it an individual, an institution or an interface – intricately interconnected. Yale’s LUX initiative also embodies this ethos, demonstrating how collaborative efforts can lead to innovative solutions and wider impact. It is to be hoped, then, that these practices of openness and multiple partnerships will not be seen as limited to their origins in digital representation. At the very least, I hope that these socio-technical approaches can serve as exemplars or sources of inspiration in broader arenas, where the principles of mutual visibility and concerted action can point the way towards cohesive and adaptive collaborative architectures. Despite its contribution, this thesis is far from perfect and certainly contains several shortcomings. I will name here three significant ones. First, the visualisations included and the use of FOL are primarily designed to support my own self-reflection and may be more beneficial to me than to the broader academic community. While they provide insights into my research process and findings, their applicability and usefulness to others might be limited. Second, the theoretical framework I employed, while instrumental to my research, may not serve as a universally applicable toolbox. Nevertheless, I urge readers to pay close attention to STS methodologies and practices. The works of Bruno Latour, Donna Haraway, and Susan Leigh Star have been invaluable companions throughout this dissertation. Additionally, for those involved in conceptualising semantic information, I recommend exploring Floridi’s PI, which offers profound insights into the nature and dynamics of information. These readings have greatly influenced my approach and understanding, and I believe they can offer valuable perspectives to others as well. Third, while the thesis aims to address both community practices and semantic interoperability, it leans more heavily towards the former. This emphasis on community practices may overshadow the broader discussion of semantic interoperability, potentially limiting the appeal of the thesis to those primarily interested in the technical aspects. Other shortcomings include the broad scope of the thesis, with three empirical chapters exploring different avenues. While this comprehensive approach provides a broad understanding of the research topic, it has also resulted in a rather lengthy thesis. This may be a challenge for the reader, as a topic of interest in one chapter may not be as compelling in another. The diversity of empirical focus, while enriching the research, may dilute the coherence for some readers, making it more difficult to maintain a consistent engagement throughout the dissertation. Despite these limitations, I hope that the different perspectives and findings contribute to a richer, more nuanced understanding of LOUD for CH. Avenues for future research are numerous and promising. One interesting area to explore is the comparative benefits experienced by early adopters of IIIF and Linked Art specifications versus those who implemented these standards later. Early adopters have the advantage of having their use cases discussed and resolved within the community, and it would be insightful to analyse the long-term impacts on their projects. Such a study is already feasible for early adopters of IIIF and will become possible to compare further implementations of Linked Art within a few years. Furthermore, future exploration could focus on the full implementation of Linked Art within PIA or similar efforts, as well as more performance-oriented testing with the deployed LOUD APIs. These efforts should further validate the robustness and scalability of these specifications. Another important area for future investigation is the participation of institutions and individuals from the Global South in both the IIIF and Linked Art communities. It is crucial to explore how we can better support their uptake of these specifications and encourage their active involvement in these initiatives to ensure a more inclusive and globally representative environment. As I reflect on the journey of this thesis, I am reminded of the powerful dialogue and collective effort that has been at its heart. Mr Gee’s poem resonates deeply with my own aspirations for this work: to enhance our understanding of the past through openness and collaboration, as can be seen in IIIF and Linked Art. As I bring this dissertation to a close, I am filled with a sense of accomplishment and a renewed commitment to promoting sound socio-technical practices. It is my hope that the insights and methodologies presented here will inspire others to engage in this ongoing dialogue, continually asking and answering the many questions that arise as we collectively explore our cultural heritage landscapes. Throughout this dissertation, British English spelling conventions are predominantly observed. However, there are instances of American English spelling where direct quotations from sources are used as well as referring to names of institutions, standards, or concepts. ↩︎ SNSF Data Portal - Grant number 193788: https://data.snf.ch/grants/grant/193788 ↩︎ Seminar für Kulturwissenschaft und Europäische Ethnologie: https://kulturwissenschaft.philhist.unibas.ch/ ↩︎ DHLab: https://dhlab.philhist.unibas.ch/ ↩︎ HKB: https://www.hkb.bfh.ch/ ↩︎ The considerable size of the ASV collection, which includes over 90,000 analogue objects, reflects not just the work of the main authors but also the contributions from numerous explorers and additional material beyond the maps and primary publications. ↩︎ Max Frischknecht’s PhD: https://phd.maxfrischknecht.ch/ ↩︎ PIA project website: https://about.participatory-archives.ch/ ↩︎ The vision of the PIA project was first written in German and then translated into English and French. ↩︎ In our joint paper, we wrote ‘man-made’, corrected here, which makes me think of the transition within the CIDOC-CRM for the Entity E22 Human-Made Object from version 6.2.7 onward. ↩︎ Knora Base Ontology: https://docs.dasch.swiss/2023.07.01/DSP-API/02-dsp-ontologies/knora-base/ ↩︎ SIPI documentation: https://sipi.io/ ↩︎ IIIF Working Groups Meeting, The Hague, 2016: https://iiif.io/event/2016/thehague/ ↩︎ Van Gogh, Vincent. (1889). Irises [Oil on canvas]. Getty Museum, Los Angeles, CA, USA. https://www.getty.edu/art/collection/object/103JNH ↩︎ Giacometti, Alberto. (1956). L’homme qui marche I [Sculpture]. Carnegie Museum of Art, Pittsburg, PA, USA. https://www.wikidata.org/entity/Q706964 ↩︎ UNESCO World Heritage List: https://whc.unesco.org/en/list/ ↩︎ Blue Shield International: https://theblueshield.org/ ↩︎ The ICBS was founded by the ICA, ICOM, ICOMOS, and IFLA. ↩︎ Guro. (1900-1950). Male Face Mask (Zamble) [Wood and pigment]. Art Institute of Chicago, Chicago, IL, USA. https://www.artic.edu/artworks/239464 ↩︎ I have opted for the term ‘affordance’ and not ‘representation’ as my intention is to maintain a comprehensive scope that encompasses various modalities such as modelling endeavours. ↩︎ To some degree, parallels can be drawn between the distinctions of cultural and digital heritage with those drawn between the humanities and DH. ↩︎ Inicio - Museos Comunitarios de América: https://www.museoscomunitarios.org/ ↩︎ The descriptions of each of these nine dimensions are selected excerpts from @star_ethnography_1999. ↩︎ A PID is a long-lasting reference to a digital resource. It usually has two components: a unique identifier and a service that locates the resource over time, even if its location changes. The first helps to ensure the provenance of a digital resource (that it is what it purports to be), whilst the second will ensure that the identifier resolves to the correct current location [@digital_preservation_coalition_persistent_2017] ↩︎ Rijksmuseum: https://www.rijksmuseum.nl/ ↩︎ In the original version, these instances contained typographical or factual errors. They have been struck through and corrected here. ↩︎ ↩︎ ↩︎ @zeng_metadata_2022 [p. 11] articulate that ‘as with “data”, metadata can be either singular or plural. It is used as singular in the sense of a kind of data; however, in plural form, the term refers to things one can count’. In the context of this thesis, I have chosen to favour the plural form of (meta)data. However, I acknowledge that I may occasionally use the singular form when referring to the overarching concepts or when quoting references verbatim. ↩︎ The snapshot of this bibliographic record was taken from https://swisscovery.slsp.ch/permalink/41SLSP_UBS/11jfr6m/alma991170746542405501. ↩︎ Seeing Standards: A Visualization of the Metadata Universe. 2009-2010. Jenn Riley. https://jennriley.com/metadatamap/seeingstandards.pdf ↩︎ A widespread example in the CH domain is the serialisation of metadata in XML, a W3C standard. ↩︎ It is noteworthy that the diversity of metadata standards in the heritage domain, characterised primarily by a common emphasis on descriptive attributes, is not counter-intuitive. This variation reflects the diverse nature of CH resources and the nuanced needs of GLAMs. ↩︎ MARC Standards: https://www.loc.gov/marc/ ↩︎ RDA: https://www.loc.gov/aba/rda/ ↩︎ If RDA was initially envisioned as the third edition of AACR, it faces the challenge of maintaining a delicate balance between preserving the AACR tradition while embracing the necessary shifts required for a successful and relevant future for library catalogues that can easily be interconnected with standards from archives, museums, and other communities [see @coyle_resource_2007]. ↩︎ MODS: https://www.loc.gov/standards/mods/ ↩︎ METS: https://www.loc.gov/standards/mets/ ↩︎ People might even argue that FRBR is only interesting as an ‘intellectual exercise’ [@zumer_functional_2007 p. 27]. ↩︎ LRMer: https://www.iflastandards.info/lrm/lrmer ↩︎ BibFrame: https://www.loc.gov/bibframe/ ↩︎ EAD: https://www.loc.gov/ead/ ↩︎ ISAD(G): General International Standard Archival Description - Second edition https://www.ica.org/en/isadg-general-international-standard-archival-description-second-edition ↩︎ PREMIS: https://www.loc.gov/standards/premis/ ↩︎ RiC Conceptual Model: https://www.ica.org/en/records-in-contexts-conceptual-model ↩︎ RiC-O: https://www.ica.org/standards/RiC/ontology ↩︎ CDWA: https://www.getty.edu/research/publications/electronic_publications/cdwa/ ↩︎ CCO: https://www.vraweb.org/cco ↩︎ VRA: https://www.vraweb.org/ ↩︎ VRA Core 4.0 and CCO have a symbiotic relationship, with CCO providing data content guidelines and incorporating the VRA Core 4.0 methodology. The latter also been leveraged in other contexts to form the basis for more granular Linked Data vocabularies [see @mixter_using_2014]. ↩︎ In French, the original language used for this acronym, CIDOC stands for Comité international pour la documentation du Conseil international des musées. ↩︎ LIDO: https://cidoc.mini.icom.museum/working-groups/lido/ ↩︎ CIDOC Working Groups: https://cidoc.mini.icom.museum/working-groups/ ↩︎ CIDOC-CRM: https://cidoc-crm.org/ ↩︎ CRM-SIG Meetings: https://www.cidoc-crm.org/meetings_all ↩︎ CIDOC-CRM V7.1.2: https://www.cidoc-crm.org/html/cidoc_crm_v7.1.2.html ↩︎ For a quick overview of the classes and properties of CIDOC-CRM, I recommend visiting the dynamic periodic table created by Remo Grillo (Digital Humanities Research Associate at I Tatti, Harvard University Center for Italian Renaissance Studies): https://remogrillo.github.io/cidoc-crm_periodic_table/ ↩︎ CIDOC-CRM compatible models and collaborations: https://www.cidoc-crm.org/collaborations ↩︎ At the time of writing none of these CIDOC-CRM extensions have been formally approved by CRM-SIG. It is also worth mentioning that other extensions based on CIDOC-CRM have been developed by the wider community, such as Bio CRM, a data model for representing biographical data for prosopographical research [see @tuominen_bio_2017] or ArchOnto, which is a model created for archives [see @hall_archonto_2020]. ↩︎ CRMact: https://www.cidoc-crm.org/crmact/ ↩︎ CRMarchaeo: https://cidoc-crm.org/crmarchaeo/ ↩︎ CRMba: https://www.cidoc-crm.org/crmba/ ↩︎ CRMdig: https://www.cidoc-crm.org/crmdig/ ↩︎ CRMgeo: https://www.cidoc-crm.org/crmgeo/ ↩︎ CRMinf: https://www.cidoc-crm.org/crminf/ ↩︎ CRMsci: https://www.cidoc-crm.org/crmsci/ ↩︎ CRMsoc: https://www.cidoc-crm.org/crmsoc/ ↩︎ CRMtex: https://www.cidoc-crm.org/crmtex/ ↩︎ FRBRoo: https://www.cidoc-crm.org/frbroo/ ↩︎ PRESSoo: https://www.cidoc-crm.org/pressoo/ ↩︎ Linked Art: https://linked.art ↩︎ DCMI Metadata Terms: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ ↩︎ Getty Vocabularies: https://www.getty.edu/research/tools/vocabularies/ ↩︎ Mastodon: https://joinmastodon.org/ ↩︎ Homosaurus: https://homosaurus.org/ ↩︎ DOLCE: www.loa.istc.cnr.it/dolce/overview.html ↩︎ It must be noted though that the use of DLs in KR predates the emergence of ontological modelling in the context of the Web, with its origins going back to the creation of the first DL modelling languages in the mid-1980s [@krotzsch_description_2013]. ↩︎ LinkedDataGPT: https://ld.gpt.liip.ch/ ↩︎ Neo4j: https://neo4j.com/ ↩︎ GB and PB are units of digital information storage capacity. 1 GB is equal to 1,000,000,000 ($10^{9}$) bytes, 1 TB is equal to 1,000,000,000,000 ($10^{12}$) bytes, and 1 PB, is equal to 1,000,000,000,000,000 ($10^{15}$) bytes. If a standard high-definition movie is around 4-5 GB, then 1 PB could store tens of thousands of movies. In 2011, @gomes_survey_2011 [p. 414] reported that the Internet Archive held 150,000 million contents of archived websites – crawled through the Wayback Machine – or approximately 5.5 PB. As of December 2021, it was about 57 PB of archived websites and a total used storage of 212 PB, see https://archive.org/web/petabox.php. ↩︎ In this context, UX is understood as an umbrella term encompassing both user and/or customer service, emphasising that the focus is on individuals who need or use a given service, regardless of their categorisation as users or customers. ↩︎ According to @nargesian_data_2019 [p. 1986], a data lake is a vast collection of datasets that has four characteristics. It can be stored in different storage systems, exhibit varying formats, may lack useful metadata or use differing metadata formats, and can change autonomously over time. ↩︎ An interesting initiative in this area is the use of RAIL, which empower developers to restrict the use of AI on the software they develop to prevent irresponsible and harmful applications: https://www.licenses.ai/ ↩︎ Common Objects in Context: https://cocodataset.org/ ↩︎ Viscounth – A Large Dataset for Visual Question Answering for Cultural Heritage: https://github.com/misaelmongiovi/IDEHAdataset ↩︎ Artificial Intelligence for Libraries, Archives & Museums: https://sites.google.com/view/ai4lam ↩︎ AEOLIAN Network: https://www.aeolian-network.net/ ↩︎ Newspaper Navigator: https://news-navigator.labs.loc.gov/ ↩︎ @perrigo_exclusive_2023 investigated that Kenyan workers made less than USD 2 an hour to identify and filter out harmful content for ChatGPT. ↩︎ FOSTER Plus (Fostering the practical implementation of Open Science in Horizon 2020 and beyond) was a 2-year EU-funded project initiated in 2017 with 11 partners across 6 countries. Its main goal was to promote a lasting shift in European researchers’ behaviour towards Open Science becoming the norm. ↩︎ According to the Open Knowledge Foundation, a non-profit network established in 2004 in the U.K., which aims to promote the idea of open knowledge, sets out some some principles around the concept of openness and defines it as follows: ‘Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)’. https://opendefinition.org/ ↩︎ Phrenosis phronesis[26:2] in philosophy is related to ‘practical understanding; wisdom, prudence; sound judgement’ [@oxford_english_dictionary_phronesis_2023] ↩︎ Zooniverse: https://www.zooniverse.org/ ↩︎ FromThePage: https://fromthepage.com/ ↩︎ FAIR Principles: https://www.go-fair.org/fair-principles/ ↩︎ FAIR Signposting: https://signposting.org/FAIR/. Signposting focuses on expressing the topology of digital objects on the web with a view to increasing the FAIRness of scholarly objects in a distributed manner [@van_de_sompel_fair_2023]. ↩︎ CARE Principles for Indigenous Data Governance: https://www.gida-global.org/care ↩︎ The Santa Barbara Statement on Collections as Data: https://collectionsasdata.github.io/statement/ ↩︎ British Library’s Research Repository: https://bl.iro.bl.uk/ ↩︎ Data Foundry – Data collections from the National Library of Scotland: https://data.nls.uk/ ↩︎ LoC Labs Data Sandbox: https://data.labs.loc.gov/ ↩︎ Royal Danish Library’s Mediestream: https://www2.statsbiblioteket.dk/mediestream/avis ↩︎ Meemoo’s Art in Flanders: https://artinflanders.be/ ↩︎ BVMC Labs: https://data.cervantesvirtual.com/ ↩︎ DATA-KBR-BE: https://www.kbr.be/en/projects/data-kbr-be/ ↩︎ A Checklist to Publish Collections as Data in GLAM Institutions: https://glamlabs.io/checklist/ ↩︎ The birth of the Web: https://home.cern/science/computing/birth-web ↩︎ All general-purpose servers must support the methods GET and HEAD. All other methods are optional. ↩︎ Schema.org: https://schema.org/ ↩︎ [@wood_linked_2014 p. 35] ↩︎ FOL, also known as first-order predicate logic or first-order predicate calculus, is a formal system of symbolic logic used in mathematics, philosophy, and computer science. It is a logical framework for expressing and reasoning about statements involving objects and their properties and relationships. In FOL, statements are represented using variables, constants, functions, and predicates. It allows for the quantification of variables and the formulation of statements such as ∀ (for all) and ∃ (there exists), which enable the expression of universal and existential quantification. As such, FOL can express facts concerning some or all of the objects in the universe. Its epistemological commitment, i.e. what an agent believes about facts, is concentrated of what is true, false, or unknown [see @russell_artificial_2010 pp. 285 ff.] ↩︎ It must be noted that DL, a subset of FOL – briefly introduced in 3.2.4.4, has a more restricted syntax and semantics tailored for ontology modelling. ↩︎ IRI is an extension of URI that allows for the use of international characters and symbols in web addresses. ↩︎ JSON-LD will be discussed through examples in . ↩︎ It is worth noting that although archives and museums have standardised their metadata practices later than libraries, they seem to have identified core models that can be more easily implemented to meet Linked Data principles. ↩︎ Author’s translation: ‘We need to give up on the idea of syntactic or structural interoperability through the use of a single model, whether for producing, storing or managing data within an information system’. ↩︎ CHAOSS: https://chaoss.community/ ↩︎ IIIF Annual Conference and Showcase - Los Angeles, CA, USA - June 4-7, 2024: https://iiif.io/event/2024/los-angeles/ ↩︎ Micrio: https://micr.io/ ↩︎ Message written on the IIIF Slack Workspace on 28 October 2022. ↩︎ mirador-image-tools: https://github.com/ProjectMirador/mirador-image-tools ↩︎ For instance, this user interface view of Claude Monet (1840-1926): https://lux.collections.yale.edu/view/person/642a0152-1567-4fbe-93f3-66f11c5cab9a and its Linked Art counterpart: https://lux.collections.yale.edu/data/person/642a0152-1567-4fbe-93f3-66f11c5cab9a ↩︎ QLever: https://github.com/ad-freiburg/qlever ↩︎ The title of the section is an homage to Bruno Latour and a passage found in his book ‘We have never been modern’. ↩︎",
"date_published": "2024-11-18T00:00:00.000Z"
}
diff --git a/feed.xml b/feed.xml
index 70a163c..375f91a 100644
--- a/feed.xml
+++ b/feed.xml
@@ -43,7 +43,7 @@
https://phd.julsraemy.ch/thesis.html
Since its inception in 2011, the IIIF has revolutionised[1] the accessibility of image-based resources. Initially driven by the needs of manuscript scholars, IIIF focused on two-dimensional images, but has since expanded to encompass a wide range of image-based resources, including audiovisual materials and, in the near future, 3D images. Similarly, Linked Art, formally established in 2017, initially concentrated on art museum objects but has since broadened its scope to model a variety of CH entities, leveraging CIDOC-CRM, a renowned ontology in the museum and DH space. Both initiatives aim to break down silos: IIIF focuses on improving the presentation of digital objects, while both initiatives enhance their dissemination. Together, they make CH data more accessible through IIIF and more meaningful through Linked Art for machines. These efforts have primarily benefited the CH domain.
A key commonality is that the main APIs these communities create align with the LOUD design principles, either intentionally or empirically demonstrated through use cases. These principles enable software developers to develop compliant tools and services without needing to fully understand RDF, a syntax for representing information on the web. Additionally, they may not need to grasp all LOD principles, which promote the interlinking of data from diverse datasets using tools like KOS such as thesauri. WADM, a W3C standard, is also recognised as a LOUD specification. It provides a framework for creating interoperable annotations on web resources, facilitating the linking and sharing of data across different platforms and applications. These LOUD design principles include the right abstraction for the audience, few barriers to entry, comprehensibility by introspection, documentation with working examples, and the use of many consistent patterns rather than few exceptions. Additionally, both IIIF and Linked Art are driven by vibrant communities, mainly comprising GLAM and higher education institutions.
-While the standards and principles discussed have broad applications, it is important to clarify the scope of this dissertation. This work does not focus on KGs by assessing triplestores – databases specifically designed to store and retrieve triples, which are the fundamental data structures in RDF. Similarly, it does not deal with evaluating SPARQL engines, which are specifically designed to query KGs. Additionally, this dissertation does not address the intersection of ML and IIIF, or the ontological reasoning of Linked Art.
+While the standards and principles discussed have broad applications, it is important to clarify the scope of this dissertation. This work does not focus on KGs by assessing triplestores – databases specifically designed to store and retrieve triples, which are the fundamental data structures in RDF. Similarly, it does not deal with evaluating SPARQL engines, which are specifically designed to query KGs. Additionally, this dissertation does not address the intersection of ML and IIIF, or the ontological reasoning of Linked Art.
Instead, this dissertation concentrates on LOUD, the consistency of its standards, design principles and the vibrant communities behind it. It examines JSON-LD serialisation efforts and the crucial intersection required to establish robust semantic interoperability baselines between presentation and semantic layers. It also presents real-world use case implementations, both on a small scale in a laboratory and flexible space within the PIA research project, and on a large scale at Yale, exemplified by the LUX platform that provides access to (meta)data from YUL, YCBA, YUAG, and YPM.
The focus is therefore on digital infrastructures capable of delivering JSON-LD files from the above specifications, which are primarily, though not exclusively, CH resources. It is more about the different actors – both human and non-human – that create and maintain these interconnected systems and the dynamic interactions that sustain them. The deployment of various LOUD specifications addresses the need for semantic interoperability between CH resources and disparate datasets by establishing a standardised approach to representing and linking data, ensuring that information can be seamlessly shared and understood across different platforms and contexts.
This dissertation seeks to carve out a distinct niche by addressing an often-overlooked aspect of IIIF and Linked Art. IIIF is sometimes perceived and studied merely as a service or an appendix, with the content it delivers taking precedence. However, this PhD thesis positions IIIF as a first-class citizen worthy of in-depth study. Similarly, Linked Art, despite its potential and its relatively recent establishment, has been the subject of very few scholarly papers. This gap underscores the significance of LOUD in this context. Furthermore, this thesis elevates Linked Art to a position of primary importance, recognising its significance and advocating for its thorough examination. To thoroughly study LOUD and its adherence to design principles, it is essential to immerse ourselves actively in both communities – an approach I have embraced for years. The thesis also emphasises the importance of participatory efforts and collaboration between research projects, which typically have shorter lifespans, and memory institutions, which need to implement technical standards as a lingua franca. In doing so, it reveals the mediating role of LOUD in advancing the heritage sphere. To truly understand IIIF, Linked Art, and to a lesser extent WADM, it is crucial to examine the social fabrics and consensus decision-making of each community. Among these considerations are how the specifications can be implemented pragmatically, and how the standards can support the implementation and maintenance of more extensive semantic interoperability efforts.
@@ -1038,7 +1038,7 @@In the context of the Internet, it is important to note that much of what we know today about it is the result of developments by many individuals and organisations. However, a significant milestone was the development of the TCP/IP protocol by Vinton Cerf and Robert E. Kahn in the 1970s (see Cerf & Kahn, 1974). This protocol became the standard networking protocol on the ARPANET in 1983, marking the beginning of the modern Internet (Leiner et al., 1997). Understanding the differentiation between the Internet and the web is crucial. The former is a global network of interconnected computers that communicate using Internet protocols, forming the infrastructure that enables online communication. The web, or World Wide Web, is a service built on top of the Internet, leveraging HTTP to transmit data. While the Internet provides the underlying connectivity, the web offers a way to access and share information through websites and links. This differentiation is vital in comprehending how the web, as a part of the Internet, has evolved into a versatile and ubiquitous platform supporting a wide array of applications.
This section, divided into five subsections, explores some of the key concepts underlying the Open Web Platform and Linked Data, and their applications in the CH field.
First, 3.4.1 examines the foundational principles and technologies that underpin the Open Web Platform. This includes an overview of principles, protocols such as HTTP, and the use of URIs to identify resources on the web. This part also explores the different types of web architectures such as the client-server model or the concept of web services, which allow for the exchange of data and functionality across different applications and systems.
-3.4.2 explores the vision of the web as a giant, interconnected database of structured data that can be queried and manipulated by machines. The subsection examines the technologies and standards that make up the Semantic Web, including RDF, RDFS, OWL, and SPARQL.
+3.4.2 explores the vision of the web as a giant, interconnected database of structured data that can be queried and manipulated by machines. The subsection examines the technologies and standards that make up the Semantic Web, including RDF, RDFS, OWL, and SPARQL.
Subsection 3.4.3 examines the set of principles designed to promote the publication and interlinking of data on the web. The subsection explores the four principles of Linked Data - using URIs to identify resources, using HTTP to retrieve resources, providing machine-readable data, and linking data to other data.
Subsection 3.4.4 examines the set of criteria for publishing data on the web in a way that makes it easily discoverable, accessible, and usable. The subsection describes the Five-Star and Seven-Star deployment schemes, which include criteria such as providing data in a structured format, using open standards, and providing a machine-readable license.
Finally, 3.4.5 explores the specific application of LOD in the CH domain. The subsection provides examples of how CHIs such as museums, libraries, and archives are using Linked Data to make their datasets more accessible and discoverable on the web.
@@ -1132,7 +1132,7 @@(…)
-(…)
-(…)
This section provides a summary of what has been discussed in this literature review as well as some preliminary insights with regard to the LOUD ecosystem, chiefly the design principles, communities, standards, and the implementations. It follows the flow of the present chapter and is organised into five subsequent parts. Finally, in , I end with a few reflections on why we ought to care about CH data in the wider sense.
-(…)
+This section provides a summary of what has been discussed in this literature review as well as some preliminary insights with regard to the LOUD ecosystem, chiefly the design principles, communities, standards, and the implementations. It follows the flow of the present chapter and is organised into five subsequent parts. Finally, in 3.7.7, I end with a few reflections on why we ought to care about CH data in the wider sense.
+CH data are unique and require different — or adjacent – methods of analysis than quantitative scientific data. The diversity of CH data, including tangible, intangible, and natural heritage, presents a challenge in preserving and promoting these resources effectively. Representing CH data in digital form can also be challenging, as it may lead to a loss of context and complexity. To address these challenges, a comprehensive understanding of the various types of heritage resources, their meanings, and values is necessary, along with an effective preservation and valorisation strategy that considers their heterogeneity. For this, I find that we need a vision that takes into account the diversity of human and non-human entities, entangled in community-based socio-technical activities, whether local, interdisciplinary or global. The following provides a concise summary of the key aspects explored in the first section, addressing the distinctive nature of CH data.
+The dimensions of CH data: I argue that three key dimensions need to be taken into consideration for CH data: heterogeneity, knowledge latency, and custodianship.
+Representing and embodying CH data: I discuss that the faithful recreation of materiality in digitised and digital-born resources appears to be a significant issue and that it needs to be treated as new types of affordances or surrogates. Moreover, while achieving a high degree of digital material fidelity may be challenging, it is important to acknowledge that accessibility remains a paramount concern in this context.
+Collectives and apparatuses are entangled: Actors, encompassing individuals, institutions, local and global communities, along with the (digital) infrastructures and their components, are deeply interwoven. These human and non-human entities collaboratively shape and navigate the complex networks of interactions and technologies that form society. Within this context, ANT emerges as a valuable theoretical framework for understanding these close relationships and interdependencies.
+A variety of different metadata standards have emerged, tailored to different functions and purposes, with some dating back to the 1960s, particularly in library settings. However, the majority of other traditional CH metadata standards have emerged between the 1990s and the 2000s. In terms of descriptive metadata standards and conceptual models, MARC, ISAD(G), CDWA, CIDOC-CRM are being supplemented – or gradually substituted – by models and profiles like LRM, RiC, and Linked Art, which – to some extent – align with Linked Data principles, enabling richer relationships and enhancing access to vast and diverse collections.
+These evolving metadata standards support a more interconnected, open, and accessible information landscape, enabling researchers, practitioners, and the public that should hopefully navigate and explore CH resources more effectively[113], especially when they are used in conjunction with controlled vocabularies, facilitating cross-domain reconciliation. As for aggregation of CH (meta)data, alternative web-based technologies to OAI-PMH appear to be viable options, using AS in particular.
+CH data is intrinsically intertwined with broader trends including Big Data and AI, serving as both a reflection and a product of these influences. Yet it is vital to recognise that CHIs and DH practitioners are – or should be – instrumental in shaping and curating such data. They not only shape and curate them, but also inform and drive the very trends that permeate across our data landscape. Their values, actions and methodologies represent critical drivers in the process, guiding the collection, preservation, and interpretation of CH data.
+The scientific movements and guiding principles of Open Science/Open Scholarship, Citizen Science/Citizen Humanities, FAIR, CARE, and Collections as Data collectively permeate and shape CH data. While the first two represent dynamic movements actively involving scholarly practices or the public, the other three are more concerned with the system in which data is situated. LOUD design principles and standards, which are more content-centric, bolster the effectiveness of these principles and movements, fostering open and accessible CH data through their real-world applicability.
+The Open Web Platform and Linked Data are foundational to the evolution of scholarly research and CH practices, enabling the creation of federated datasets and KGs. The section explores these concepts and their applications in the CH domain.
+The web architecture, underpinned by principles, protocols, and identifiers like URI, URL, and URN, facilitates the exchange of data and functionality across applications and systems. It emphasises architectural principles such as orthogonality and protocol-based interoperability and explores various web architectures, including the client-server model and SOA.
+The Semantic Web, a vision for a web understood by machines, relies on standards like RDF, RDFS, OWL, and SPARQL. The limitations of RDF in complex KR can be mitigated to some extent with RDF reification and quoted triples. SHACL is also introduced for data graph validation.
+Linked data principles promote the publication and interlinking of data on the web, creating a web of data that is navigable and usable. Challenges in Linked Data implementation include GUIs, application architectures, schema mapping, link maintenance, licensing, trust, quality, relevance, and privacy, enhancing the web’s potential as an open, interconnected platform.
+Deployment schemes like the Five-Star and Seven-Star models provide criteria for publishing open data. These models address the clarity, usability, and applicability of open data, emphasising schema documentation and data quality.
+Finally, the application of LOD in the CH sector is explored through examples like Europeana. Despite its potential in improving data quality and visibility, challenges persist, including issues related to cataloguing, adoption of new standards, and the complexity of Linked Data terminology. The section underscores the need for collaboration and community-driven practices for effective LOD implementation.
+LOUD focuses on improving data accessibility primarily for software developers. It balances data completeness and practical considerations like scalability and usability. Its design principles are:
+The systematic review of LOUD in scholarly literature employed the weight of evidence framework. A Boolean query identified relevant papers, with findings showing 46 relevant references from 2018-2023, mainly in English. These papers were categorised into four main categories: mentions of LOUD, descriptions of LOUD, explanations of LOUD design principles, and comparative analyses where theLOUD principles have been reused in various applications.
+LOUD integrates technologies, mostly community-driven, like IIIF, WADM, and Linked Art. IIIF facilitates the sharing of high-resolution images and audiovisual content through a series of specifications, WADM provides a standard for creating and sharing annotations across various platforms, and Linked Art provides a model and an API specification for semantically describing CH. Together, they demonstrate a transformative potential in how CH data is interacted with and understood, reshaping traditional humanities and opening new research opportunities.
+In the exploration of community dynamics and the intricacies of data exchange, two axes or perspectives come into focus in my PhD: community practices and semantic interoperability. These axes represent not just isolated concepts but comprehensive frameworks that influence how communities function and thrive in an interconnected world. Community practices, as shared activities and rituals, weave the fabric of collective identity within communities, while semantic interoperability acts as the bridge for meaningful and truthful data exchange between different systems.
+Both community practices and semantic interoperability will permeate the empirical parts of the thesis as I explore LOUD for CH through different prisms. These axes serve as critical lenses through which we can dissect and analyse the dynamics, challenges, and opportunities that arise within this landscape.
+One thing that has been partially touched upon but not strongly asserted in this chapter is ‘why do we really care about cultural heritage?’
+The importance of CH data as primary and secondary sources for DH practitioners transcends the mere definition, interlinking, and preservation of these sources. The undertaking, notwithstanding technological assistance, can be inherently challenging, contingent upon numerous interdependencies.
+Moreover, it is deeply rooted in our response to many pressing global challenges, including the far-reaching consequences of climate change largely caused and accelerated by human activities. In an era characterised by profound disruptions, exemplified by events such as heatwaves, fires, droughts, floods, rising sea levels, and the resultant migrations driven by these environmental changes, which affect not only humans but the entire biosphere, it becomes increasingly important to emphasise the societal responsibilities that accompany our engagement with CH as well as DH practices in the Anthropocene (see Nowviskie, 2015).
As Jim Clifford taught me, we need stories (and theories) that are just big enough to gather up the complexities and keep the edges open and greedy for surprising new and old connections. (Haraway, 2016, p. 101)
@@ -1295,7 +1338,7 @@8. Yale’s LUX and LOUD Consistency<
(…)
9. Discussion
-[Il] faut renoncer à l’idée d’une interopérabilité syntaxique ou structurelle par l’utilisation d’un modèle unique, qu’il s’agisse de la production, de stockage ou de l’exploitation au sein même d’un [système d’information]. (Poupeau, 2018) [113]
+[Il] faut renoncer à l’idée d’une interopérabilité syntaxique ou structurelle par l’utilisation d’un modèle unique, qu’il s’agisse de la production, de stockage ou de l’exploitation au sein même d’un [système d’information]. (Poupeau, 2018) [114]
This chapter presents a comprehensive discussion where I interpret, analyse and critically examine my findings in relation to the thesis and the wider application of LOUD. Through an in-depth analysis of the design principles of LOUD and their implications for CH, this discussion aims to demonstrate the many challenges and opportunities inherent in this framework. The focus is on achieving community-driven consensus, rather than simply pursuing technological breakthrough.
The following sections are organised to provide a comprehensive review of the empirical findings, an evaluation abstracting LOUD, and a retrospective analysis of the research journey. Firstly, in Section 9.1, I will present a summary of the empirical findings from my research. This will include key themes and insights, structured to reflect the different areas of study and practice within LOUD.
@@ -1305,7 +1348,7 @@9.1 Empirical Findings
This section summarises the empirical findings of my research and already offers some suggestions. The structure does not follow the exact order of the three empirical chapters but is organised around overarching topics that emerged throughout the study. The seven topics include Community Practices and Standards, Inclusion and Marginalised Groups, Maintenance and Community Engagement, Interoperability and Usability, Future Directions and Sustainability, Digital Materiality and Representation, as well as Challenges of Scaling and Implementation.
Community Practices and Standards
GitHub serves as a vital hub for community involvement, with a core group of active contributors often attending meetings regularly. This platform simplifies decision-making within the community, although it also reflects biases similar to those in FLOSS communities. Behind visible activities like meetings, there is substantial preparatory work managed by co-chairs, editorial boards, or driven by community-generated use cases. This foundational work often determines the direction and outcomes of formal gatherings. The LUX project at Yale, as seen in , has successfully fostered collaboration across various units, bringing together libraries and museums on a unified platform. The technological foundation of LUX, based on open standards, facilitates data integration and cross-collections discovery.
-Not only does the deployment of FLOSS tools contribute to these achievements, but it also emphasises the social advantages of working collaboratively. The concept of the Tragedy of the Commons, as described by (Hardin, 1968), highlights the potential for individual self-interest to deplete shared resources. However, (Ostrom, 1990) offers a counterpoint by demonstrating how communities can successfully manage common resources through collective action and shared norms. In this context, initiatives like the CHAOSS initiative[114] play a significant role by providing metrics that help evaluate the health and sustainability of open source communities. These metrics include contributions, issue resolution times, and community growth, offering valuable insights into how collaborative efforts can be maintained and improved.
+Not only does the deployment of FLOSS tools contribute to these achievements, but it also emphasises the social advantages of working collaboratively. The concept of the Tragedy of the Commons, as described by (Hardin, 1968), highlights the potential for individual self-interest to deplete shared resources. However, (Ostrom, 1990) offers a counterpoint by demonstrating how communities can successfully manage common resources through collective action and shared norms. In this context, initiatives like the CHAOSS initiative[115] play a significant role by providing metrics that help evaluate the health and sustainability of open source communities. These metrics include contributions, issue resolution times, and community growth, offering valuable insights into how collaborative efforts can be maintained and improved.
Reaching consensus is another critical aspect of community practices and standards. While the minutes of meetings are valuable artefacts, they often reflect an Anglo-Saxon approach to decision-making characterised by few substantive points and critical turning points. The formal aspects of conversations captured in minutes do not fully encompass the decision-making process, which frequently involves informal conversations, consensus-building through open dialogue, and subtle cues that influence outcomes. These elements are integral to the English and American approach and hold valuable lessons for an international community. IIIF and Linked Art are international communities, but decisions are made in English and the majority of participants are based in North America and the UK, significantly imprinting this approach. Understanding these nuances can help us improve our collaborative efforts within the IIIF and Linked Art communities. By recognising and appreciating these different facets of decision-making, we can learn from each other and enhance our collective ability to make effective and inclusive decisions.
Some of the challenges associated with these practices include the major demand on resources for community building, the slowness inherent in distributed development, and the difficulty in achieving consensus. Additionally, the concept of social sustainability can be seen as an imaginary construct that papers over differences, as discussed by Addressing these challenges is crucial for the long-term success and effectiveness of the IIIF and Linked Art communities.
Inclusion and Marginalised Groups
@@ -1324,7 +1367,7 @@9.1 Empirical Findings
I would suggest as a way forward for the IIIF and Linked Art communities to focus on further improving usability of the specifications. This includes conducting comprehensive usability assessments of APIs to evaluate the experiences of new developers versus existing ones, understanding the steepness of the learning curve associated with each API, and guiding improvements in documentation, on-boarding processes, and overall developer support. Efforts should be made to lower the barriers to entry for new developers by developing more intuitive and user-friendly tutorials, providing example projects, and creating a robust support community. Ensuring that developers can quickly and effectively leverage APIs will foster greater adoption. Addressing the challenges of transitioning between different versions of specifications is critical, and developing tools and guidelines that help maintain consistency across versions will reduce friction and ensure smoother updates.
Future Directions and Sustainability
Survey findings, as discussed in , underscore the need for ongoing efforts to develop LOUD standards that foster an inclusive, dynamic digital ecosystem. Future strategies should include creating educational resources and frameworks that support interdisciplinary collaboration and reduce barriers to participation. While the Manifest serves as the fundamental unit within IIIF, the Linked Art protocol can play a similar central role as semantic gateways in broader contexts, allowing round-tripping across the APIs. The topic modelling exercise in LUX, detailed in , reveals complex actor-networks of organisations, individuals, and non-human actors, providing insights into the relationships sustaining the LUX initiative.
-The next steps for Linked Art might involve forming a new consortium independent of a CIDOC Working Group, which could provide the necessary support to sustain the initiative. Alternatively, integrating Linked Art into IIIF as a new TSG and specification could address the discovery challenges within IIIF, as discussed during the birds of a feather session led by Robert Sanderson (see Raemy, 2024) at the 2024 IIIF Conference in Los Angeles[115]. Design principles that act as bridges across different disciplines, as proposed by (Roke & Tillman, 2022), are crucial. IIIF has demonstrated that this collaborative approach is feasible, and Linked Art could follow in its footsteps. However, achieving this requires increased dedication from passive members and broader adoption of the model and the API ecosystem in the near future.
+The next steps for Linked Art might involve forming a new consortium independent of a CIDOC Working Group, which could provide the necessary support to sustain the initiative. Alternatively, integrating Linked Art into IIIF as a new TSG and specification could address the discovery challenges within IIIF, as discussed during the birds of a feather session led by Robert Sanderson (see Raemy, 2024) at the 2024 IIIF Conference in Los Angeles[116]. Design principles that act as bridges across different disciplines, as proposed by (Roke & Tillman, 2022), are crucial. IIIF has demonstrated that this collaborative approach is feasible, and Linked Art could follow in its footsteps. However, achieving this requires increased dedication from passive members and broader adoption of the model and the API ecosystem in the near future.
Digital Materiality and Representation
As explored in Chapter 7, the detailed digital representation of photographic albums, such as the Kreis Family Collection, demonstrates the need to comprehensively capture the materiality of digital objects. This includes the structure and context of images, which are crucial for maintaining their historical and social significance. The implementation of the IIIF Presentation API in creating a detailed digital replica of the Getty’s Bayard Album shows how digital materiality can be enhanced through thoughtful use of technology, but also highlights the scalability challenges for such detailed representations.
Creating these detailed digital representations can be seen as a ‘boutique’ approach, which, while labour-intensive and resource-demanding, is necessary for preserving the integrity and contextual significance of cultural heritage objects. The challenge lies in developing the appropriate means and methodologies to achieve this level of detail consistently. Future endeavours, whether through research projects or collaborative efforts between GLAM institutions and DH practitioners, should aim to address these challenges and create sustainable practices for digital materiality and representation. As Edwards aptly notes:
@@ -1334,7 +1377,7 @@9.1 Empirical Findings
Challenges of Scaling and Implementation
As seen in Chapter 6, the IIIF Cookbook recipes and Linked Art patterns reflect the tension between creating advanced specifications and their practical implementation. This gap between ideation and real-world application underscores the challenges faced by the community in achieving broad adoption and interoperability. In Chapter 7, the exploration of APIs like the IIIF Change Discovery API illustrates the practical challenges and potential of scaling these technologies for wider adoption. The successful implementation in PIA demonstrates viability, but also points to the need for continued development and community engagement to fully realise the benefits.
-Furthermore, assessing the scalability of IIIF image servers, as discussed by (Duin, 2022) and exemplified by the firm Q42 with their Edge-based service Micrio[116], highlights the importance of optimising data performance. Erwin Verbruggen aptly noted that ‘optimising data performance in my opinion mens sending as little data over as needed’[117], emphasising the need for efficient data handling to enhance scalability. This insight reinforces the necessity of continual refinement in scaling digital infrastructure to support broader use and integration.
+Furthermore, assessing the scalability of IIIF image servers, as discussed by (Duin, 2022) and exemplified by the firm Q42 with their Edge-based service Micrio[117], highlights the importance of optimising data performance. Erwin Verbruggen aptly noted that ‘optimising data performance in my opinion mens sending as little data over as needed’[118], emphasising the need for efficient data handling to enhance scalability. This insight reinforces the necessity of continual refinement in scaling digital infrastructure to support broader use and integration.
Reflecting on these findings, I would like to assert that continuous participation, particularly for institutions that can afford to be part of initiatives like IIIF-C, is essential. Active members should not only focus on their own use cases but also consider the needs and perspectives of other, perhaps marginalised, groups. Achieving the dual goals of making progress within one community, whether it be IIIF or Linked Art, while also engaging in effective outreach and creating a solid baseline, will benefit everyone in the CH sector and beyond. Addressing where LOUD fits in, how people perceive this new concept or paradigm, and understanding how LOUD differs from Linked Data in general are essential. These questions help to clarify the stages at which themes related to one of the LOUD design principles emerge, crystallise, and potentially disappear. My thesis does not fully resolve these queries but offers insights and hints for further exploration.
In conclusion, the empirical findings reveal the richness of the implementation and maintenance of LOUD standards in the CH domain. From the critical role of community practices and standards to the challenges of achieving interoperability and inclusivity, each theme underlines the complex interplay of social, technical and organisational factors. will look at the evaluation of LOUD and explore its overall impact, delving into the delta of what to do with it, particularly in terms of Linked Data versus LOUD, where my thesis provides pointers but does not provide definitive answers.
Referring to Figure 4.2, the following is a descriptive attempt to provide levels of abstraction of LOUD based on my empirical findings, focusing particularly on the deployment of IIIF within PIA and Linked Art within the LUX framework, aside from the data model abstraction level.
Representation and Display: For PIA, the implementation of Leaflet provided an immediate and easy-to-integrate viewer to display high-resolution digitised images of CAS photographs. The context is accessible through accompanying metadata and related links on the GUI. Although not LOUD-driven per se, it functions as a mediator through the IIIF Image API. Balancing between immediacy and hypermediacy, the Mirador instance enabled the display of IIIF Presentation API resources with machine-generated annotations. We also incorporated the V3 plug-in to manipulate images[118]. However, we failed to provide a robust authentication layer allowing users to add their own annotations easily, highlighting the limitations of a four-year research project not primarily aimed at tool development but at proposing a participatory system. IIIF-compliant software can aid in this, yet development needs to be community-driven rather than individualised. Exhibit was the only tool used for educational and teaching purposes that was well-received, though integration issues persisted. LUX exemplifies Linked Art hypermediacy, where the structure of the JSON-LD representation drives their GUI, including URL syntax[119]. For both PIA and LUX, JSON-LD serves as an interface for certain users (software developers, data curators, data scientists). While resources can become BOs depending on the viewer, a few inconsistencies can still be overcome and will likely be understood by humans reading the files.
+Representation and Display: For PIA, the implementation of Leaflet provided an immediate and easy-to-integrate viewer to display high-resolution digitised images of CAS photographs. The context is accessible through accompanying metadata and related links on the GUI. Although not LOUD-driven per se, it functions as a mediator through the IIIF Image API. Balancing between immediacy and hypermediacy, the Mirador instance enabled the display of IIIF Presentation API resources with machine-generated annotations. We also incorporated the V3 plug-in to manipulate images[119]. However, we failed to provide a robust authentication layer allowing users to add their own annotations easily, highlighting the limitations of a four-year research project not primarily aimed at tool development but at proposing a participatory system. IIIF-compliant software can aid in this, yet development needs to be community-driven rather than individualised. Exhibit was the only tool used for educational and teaching purposes that was well-received, though integration issues persisted. LUX exemplifies Linked Art hypermediacy, where the structure of the JSON-LD representation drives their GUI, including URL syntax[120]. For both PIA and LUX, JSON-LD serves as an interface for certain users (software developers, data curators, data scientists). While resources can become BOs depending on the viewer, a few inconsistencies can still be overcome and will likely be understood by humans reading the files.
Data Model: The data model of IIIF is primarily driven by its design principles and WADM. Also the main unit is the Manifest, often a digitisation or representation of a physical object, meaning the Presentation API is key to achieve an acceptable level of interoperability. The Shared Canvas Data Model is still somewhere, baked into the specifications, but one does not really need to know about this model to understand how IIIF works from Version 2 of the main APIs onwards, it is a piece of history though. IIIF is Linked Data, but has no real semantics value and should really not be treated as RDF triples. One could almost say the same about Linked Art, as it is not necessary to fully understand CIDOC-CRM to either start using its model or to deploy an API endpoint. However, some basics do need to be understood, such as the event-based model viewpoint, the important classes and their rdfs:domain
and rdfs:range
. However, Linked Art has bent some rules and created some properties and classes to meet the needs of the community. As far as implementation goes, I would suggest to directly implement and be consistent with the Linked Art API endpoints, rather than starting with the data model, as I see cross-institutional interoperability through interfaces as a more important milestone than data modelling as a pastime for the few specialists. For both PIA and LUX, dedicated data models were done to be consistent with the specifications, with some internal structure and data in LUX for their own purposes. Semantic data in PIA, as already realised through the Omeka A JSON-LD API was not done beyond templating of a few Linked Art resources and the workflow done with the University of Oxford, but there is not a productive PIA Linked Art API at the moment.
Infrastructure: Serialisation mock-ups and JSON-LD templates on GitHub were the starting point to model IIIF Manifests and Collections for the PIA research project. Laravel and then Omeka A were the two main elements, in two different iterations, that were leveraged to present the IIIF resources. If single image Manifests were quite easy to serialised, the integration of more detailed representations of photographic albums presented challenges. A more robust infrastructure is definitely needed for the long-run, but efficient enough in a laboratory setting.
Algorithmic and Computational Processes: PIA relied on virtual machines and had the necessary Kakadu licence embedded in our SIPI instance to encode the images. If the former proved difficult as performance was sometimes an issue, the latter was a good option as serving JPEG2000 images cannot currently rely on FLOSS solutions which are too slow. The LUX pipeline and the use of MarkLogic as a multi-modal database are examples of the data engineering expertise and outsourcing solutions required for such a platform. Some open source solutions, such as QLever[120], a high-performance SPARQL engine, may also offer some hope to institutions that are not well-funded and need robust knowledge graph-oriented solutions.
+Algorithmic and Computational Processes: PIA relied on virtual machines and had the necessary Kakadu licence embedded in our SIPI instance to encode the images. If the former proved difficult as performance was sometimes an issue, the latter was a good option as serving JPEG2000 images cannot currently rely on FLOSS solutions which are too slow. The LUX pipeline and the use of MarkLogic as a multi-modal database are examples of the data engineering expertise and outsourcing solutions required for such a platform. Some open source solutions, such as QLever[121], a high-performance SPARQL engine, may also offer some hope to institutions that are not well-funded and need robust knowledge graph-oriented solutions.
The dual simplicity and complexity of implementing LOUD specifications and participating in community-led efforts can be attributed to the need for a reorientation of research projects. It is essential for these projects to actively engage in community processes rather than intermittently presenting their progress and subsequently withdrawing. This ongoing engagement fosters a more robust and collaborative environment, ultimately contributing to the advancement of shared goals and standards. Such a reorientation necessitates a fundamental change in how universities and GLAMs institutions operate, extending their involvement beyond the immediate project scope to ensure sustained participation and impact.
@@ -1359,7 +1402,7 @@I have faced challenges in moving many of the models developed within PIA into (beta) production, and the usability requirements of APIs have scarcely been addressed. However, the findings from this thesis should be viewed as starting points rather than conclusive solutions. The unseen aspect of this dissertation is my active involvement in both communities and my attempts to reciprocate this engagement within PIA. Each investigation presented could have warranted a dedicated thesis, indicating the breadth and depth of the topics explored. Ultimately, this work merely scratches the surface of numerous subjects, laying the groundwork for future research and development.
The next section will offer a retrospective on the work accomplished during this PhD thesis. It will reflect on the various milestones achieved, the lessons learned, and the potential directions for future research.
In this retrospective[121], I will offer an analysis of the research journey. This section will interpret the findings to situate LOUD as fully-fledged actors within the CH field. It will reflect on the challenges, achievements, and lessons learned throughout the research process, providing a holistic view of the project’s trajectory and its implications for the future of LOUD.
+In this retrospective[122], I will offer an analysis of the research journey. This section will interpret the findings to situate LOUD as fully-fledged actors within the CH field. It will reflect on the challenges, achievements, and lessons learned throughout the research process, providing a holistic view of the project’s trajectory and its implications for the future of LOUD.
The empirical findings of my research reveal the nuanced interplay between socio-technical practices and implementations, synthesising insights through both thematic and abstract lenses. This dual approach underscores the importance of fostering collaboration and effective decision-making, while addressing biases and promoting inclusivity. The need for ongoing maintenance, interoperability and usability remains paramount, as does the development of educational resources and consortia to sustain initiatives. In addition, capturing digital materiality and addressing scalability challenges are critical to the widespread integration of LOUD standards. These findings lay the groundwork for future research and development aimed at bridging operational applications with more extensive design approaches.
How can LOUD be situated as fully-fledged actors within the CH field? Reflecting on the notion of , frequently mentioned during the 2024 IIIF Conference, LOUD specifications embody this concept perfectly. Even if not all embedded patterns of a given API-compliant resource are correctly interpreted or rendered by a client, some of its basic features should still be displayed. This flexibility is crucial for ensuring the broad usability and adaptability of LOUD standards, allowing them to transcend institutional boundaries and serve as robust mediums of knowledge transfer. To paraphrase (Poupeau, 2018)’s quote at the beginning of this chapter, there isn’t a unique model for interoperability, but there are definitely best sociotechnical practices to be learned from IIIF and Linked Art. The act of participation prevails over the relatively easy and one-off deployment of specifications for the short term.
By using LOUD, CH data can be effectively interlinked with different datasets, resulting in numerous potential benefits. An overriding benefit is the improved discoverability and accessibility of CH resources, facilitating enhanced search and retrieval capabilities. In addition, the adoption of LOUD promotes seamless data sharing and reuse within academic and memory institutions, fostering a culture of collaboration and interdisciplinary knowledge exchange. This approach not only enhances the overall utility and comprehensiveness of CH repositories, but also promotes collective understanding and appreciation of diverse cultural assets and historical narratives.
@@ -1763,38 +1806,43 @@JSON-LD will be discussed through examples in . ↩︎
Author’s translation: ‘We need to give up on the idea of syntactic or structural interoperability through the use of a single model, whether for producing, storing or managing data within an information system’. ↩︎
+It is worth noting that although archives and museums have +standardised their metadata practices later than libraries, they +seem to have identified core models that can be more easily +implemented to meet Linked Data principles. ↩︎
+Author’s translation: ‘We need to give up on the idea of syntactic or structural interoperability through the use of a single model, whether for producing, storing or managing data within an information system’. ↩︎
CHAOSS: -https://chaoss.community/ ↩︎
+CHAOSS: +https://chaoss.community/ ↩︎
IIIF +
IIIF Annual Conference and Showcase - Los Angeles, CA, USA - June 4-7, -2024: https://iiif.io/event/2024/los-angeles/ ↩︎
+2024: https://iiif.io/event/2024/los-angeles/ ↩︎Micrio: https://micr.io/ ↩︎
+Micrio: https://micr.io/ ↩︎
Message written on the IIIF Slack Workspace on 28 October 2022. ↩︎
+Message written on the IIIF Slack Workspace on 28 October 2022. ↩︎
mirador-image-tools: -https://github.com/ProjectMirador/mirador-image-tools ↩︎
+mirador-image-tools: +https://github.com/ProjectMirador/mirador-image-tools ↩︎
For instance, this user interface view of Claude Monet +
For instance, this user interface view of Claude Monet (1840-1926): https://lux.collections.yale.edu/view/person/642a0152-1567-4fbe-93f3-66f11c5cab9a and its Linked Art counterpart: -https://lux.collections.yale.edu/data/person/642a0152-1567-4fbe-93f3-66f11c5cab9a ↩︎
+https://lux.collections.yale.edu/data/person/642a0152-1567-4fbe-93f3-66f11c5cab9a ↩︎The title of the section is an homage to Bruno Latour and a -passage found in his book ‘We have never been modern’. ↩︎
+The title of the section is an homage to Bruno Latour and a +passage found in his book ‘We have never been modern’. ↩︎