This project processes EpiDoc TEI XML files and presents them as a static website.
It uses a monorepo structure with two main components: an ETL (Extract, Transform, Load) process for handling XML files, and a web application for presenting the processed data.
The main components of the project are:
packages/
etl/
: ETL package for processing XML
frontend/
: Static site generator web applicationdata/
processed/
: Output data generated by theetl
package after processing theraw
dataraw/
: Git submodule for the EpiDoc files
xslt/
epidoc/
: Git submodule for XSLT stylesheetsstart-edition.sef.json
: Compiled version of the XSLT to convert the XML files into HTML
graph TD
A[EpiDoc Submodule] --> B[ETL Process]
X[XSLT Submodule] --> B
B -->|Transform XML| C[Saxon-JS]
C -->|HTML Fragments| D[Processed HTML]
B -->|Extract Corpus Data| E[JSON Data]
D --> F[Static Site Generator]
E --> F
G[Markdown Files] -.-> F
F -->|Generate Pages| H[Static HTML]
H -.->|Index| I[Pagefind]
E -.->|Map Data| F
H -.-> J[Interactive Map]
-
Clone this repository
-
Initialise and update the submodules
git submodule update --init --recursive
-
Install dependencies
npm install
-
Run the etl process
npm run etl
-
Run the development server
npm run frontend:dev
The project should be available at http://localhost:5173/.
Static pages are added to the site via markdown files. Markdown support is implemented in the project using mdsvex. Pages are added to the site by adding a new entry to the frontend/src/routes/ directory.
First, create a new sub-directory in the routes directory. For example, to add
a new page called "about", create a new directory called about
and add
a +page.md
file to it.
The +page.md
file should contain the markdown content for the page. The page
will be added to the site and will be accessible at http://PROJECT_URL/about
.
New editorial content should be added in the research
branch. This branch is
automatically deployed to the preview site in GitHub Pages, together with the develop
and main
branches.
Content that needs to be visible to the public should be added to the main
branch. Content to the main
branch needs to be added via a pull request.
The site is automatically deployed, via a GitHub Actions workflow, to GitHub Pages whenever there are commits to the develop
, main
or research
branches.
The preview site is available at https://kingsdigitallab.github.io/corpus-building/.