Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: incremental archival builds #4572

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

marcelgerber
Copy link
Member

@marcelgerber marcelgerber commented Feb 18, 2025

Part of #4434.

This PR adds

  • a DB table archived_chart_versions, keeping track of any archived versions
  • hashing, based on the checksums of all indicators and the chart config (all coming from the DB)
  • a detection of which graphers need to be incrementally rebuilt, based on these hashes and the DB table
  • the generation of a more useful manifest

To run this:

yarn buildViteSite
VITE_PREVIEW=true npx tsx --tsconfig tsconfig.tsx.json ./baker/archival/archiveChangedGrapherPages.ts --latestDir
npx serve archive

Then navigate to http://localhost:3000.

The first run of this might take some time: on the staging server, it took 17 minutes.
Any runs after the first run are fast, as they are incremental (on staging: 5 seconds for one changed chart).

You can then change a grapher config in the admin, re-run the script, and see that it only re-archives that one changed grapher.


export interface DbEnrichedArchivedChartVersion
extends Omit<DbPlainArchivedChartVersion, "manifest"> {
manifest: Record<string, any> // TODO: Turn into more specific type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new Todo was discovered. If it is not a priority right now,consider marking it for later attention.
TODO: Turn into more specific type

  • Ignore

@owidbot
Copy link
Contributor

owidbot commented Feb 18, 2025

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-archived-chart-versions

SVG tester:

Number of differences (default views): 0 ✅
Number of differences (all views): 0 ✅

Edited: 2025-02-18 20:57:55 UTC
Execution time: 1.21 seconds

export const hashAndWriteFile = async (
filename: string,
content: string,
archiveDir: string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for myself: clean up the archiveDir param - we don't really need it here
(same goes for below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much of this has been moved into ArchivalBaker, but it's also changed enough that a re-review of that file might be useful.

Comment on lines +157 to +162
return {
[variableId]: await bakeVariableDataFiles(
variableId,
archiveDir
),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a curiosity here in that we re-bake variable files even if they didn't change (their checksums are the same), but the grapher config of an underlying chart changed.

What's happening in practice is that the variable file is then re-fetched, re-hashed, and is then overriding the existing file under the same name.

It's not ideal, but in the grand scheme of things it's not a worry also - the median incremental update will probably touch less than 10 charts and less than 20 variables or so.

Comment on lines +268 to +273
if (copyToLatestDir) {
const latestDir = path.join(archiveDir, "latest")
await fs.remove(latestDir)
await fs.copy(dir, latestDir)
console.log(`Copied ${dir} to ${latestDir}`)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Currently, this just makes latest be the exact same as the latest timestamp folder.
We'll want to change this to be cumulative: Copy all files contained in [latestDate] over, but don't delete any of the others already contained in latest.

}
})

export const assembleManifest = async (manifestInfo: {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcelgerber marcelgerber requested a review from rakyi February 18, 2025 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants