Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do store/expose logs for (at least failed) runs. #72

Open
yarikoptic opened this issue Mar 6, 2024 · 21 comments
Open

Do store/expose logs for (at least failed) runs. #72

yarikoptic opened this issue Mar 6, 2024 · 21 comments
Assignees

Comments

@yarikoptic
Copy link
Member

yarikoptic commented Mar 6, 2024

Prior ref:

So we do store the logs from each run combined across all assets, e.g.

(venv) dandi@drogon:~/cronlib/dandisets-healthstatus$ ls -lta  results/000003/ | head -n 6
total 4280
drwxr-xr-x 331 dandi dandi 12288 Mar  6 10:36 ..
drwxr-xr-x   2 dandi dandi 86016 Mar  6 10:36 .
-rw-r--r--   1 dandi dandi 16530 Mar  6 10:36 status.yaml
-rw-r--r--   1 dandi dandi 90740 Jan  6 00:34 2024.01.03.12.21.19_matnwb_nwbRead_errors.log
-rw-r--r--   1 dandi dandi  1125 Nov 18 02:13 2023.11.18.00.50.33_matnwb_nwbRead_errors.log
(venv) dandi@drogon:~/cronlib/dandisets-healthstatus$ grep '^Asset' results/000003/2024.01.03.12.21.19_matnwb_nwbRead_errors.log | nl | tail -n 2
    80  Asset: sub-YutaMouse57/sub-YutaMouse57_ses-YutaMouse57-161014_behavior+ecephys.nwb
    81  Asset: sub-YutaMouse57/sub-YutaMouse57_ses-YutaMouse57-161016_behavior+ecephys.nwb

so to access a fail for some sample failing healthcheck, someone needs to

  • have access to drogon
  • login to it
  • find folder
  • find most recent log file which would have asset in question listed

which is quite cumbersome. So, ideally we should have them committed to git and pushed directly here on github. The problem is that we can't keep history growing since then repo would grow too large too quickly. But I think we will be ok if we keep only the single state commit and rewrite it. So let's

  • create logs/ submodule (https://github.com/dandi/dandisets-healthstatus-logs repo, created now)
  • while running a test on a file, create corresponding to the asset path folder (e.g. sub-YutaMouse57/sub-YutaMouse57_ses-YutaMouse57-161014_behavior+ecephys.nwb/
  • write under that folder stderr per test, e.g matnwb_nwbRead_errors.txt
  • if file ends up empty after run -- remove
  • when saving results from a run, commit changes to logs/ --ammend'ing prior commit, or pretty much can recreate that branch from scratch every time, just reusing the tree of prior state
  • force push logs/

In the README.md here, per each dandiset row, add there a convenience link to the logs, so row looks like

000019 [logs] | 30 passed, 1 failed, 0 timed out | 0 passed, 31 failed, 0 timed out | —

Ideally, if it is feasible, if we could as a comment in results/ yaml files urls to logs, like

tests:
- assets_nok:
  - sub-EC2/sub-EC2_ses-EC2-B1.nwb  # https://github.com/dandi/dandisets-healthstatus-logs/blob/main/000019/sub-EC2/sub-EC2_ses-EC2-B1.nwb/pynwb_open_load_ns_errors.txt
  assets_ok:
  - sub-EC2/sub-EC2_ses-EC2-B105.nwb
  name: pynwb_open_load_ns

that would have been great, but I have not looked how feasible it is yet.

@yarikoptic
Copy link
Member Author

ping on this

@yarikoptic
Copy link
Member Author

@rly please sign up to this issue to know when you can get to the logs seamlessly

@jwodder
Copy link
Member

jwodder commented Jul 15, 2024

@yarikoptic

The problem is that we can't keep history growing since then repo would grow too large too quickly. But I think we will be ok if we keep only the single state commit and rewrite it.

Are you sure that's necessary? I expect the repository will end up containing a large number of small files either way, so most of the space consumption will be due to the amount of files rather than their contents. Moreover, seeing as we're currently having performance issues with just testing one asset per Dandiset, and assets that haven't been tested recently are preferred, I expect that individual files in the repository will change infrequently, so rewriting history still won't save much space.


  • Are the files in logs/ supposed to always store the stderr from the most recent run for the respective asset & test, or should only the stderr from failed tests be stored? What about storing stdout?

  • If we're moving the results/{dandiset_id}/*.log files to this new repository, can/should the status.yaml files be moved there as well? That would help separate the code from its output.

    • If we keep the status.yaml files in this repository, then all paths in results/ would be of the form results/{dandiset_id}/status.yaml. Could we just collapse them down to results/{dandiset_id}.yaml?

@jwodder
Copy link
Member

jwodder commented Jul 15, 2024

@yarikoptic Also, if we move all of results/ to this new repository, and commits to it are performed by dandsets-healthstatus directly (rather than by a follow-up datalad save invocation as is currently done), then I think it would be a good idea for the commit messages to include the total runtime of the command that made the commit (possibly with other details like the Git commit hash of the source code repo) so that we can analyze dandisets-healthstatus runtimes over time.

@jwodder
Copy link
Member

jwodder commented Jul 19, 2024

@yarikoptic Ping.

@yarikoptic
Copy link
Member Author

  • I don't mind separating code into a separate repository. That's how we got https://github.com/dandi/dandi-api-webshots-tools in the past etc.
  • I love idea of making commits more informative about any given run! FWIW commit message could be provided to datalad's .save
  • indeed it might be useful/informative to store stdout as well. May be in the fashion of chronic tool though -- store/keep only if test fails.
  • with stdout/stderr in a separate repo, I still feel that it is better to keep results/{dandiset_id}/status.yaml even though it does cost more for git (blob + tree objects instead of just a blob) and filesystem. May be because I envision having a symlink logs/ under it pointing to overall logs submodule/{dandiset_id}/ path or smth like that
  • history rewrite: indeed ATM we have lots of errors and fails so there would be lots of files but not that many changes from each run. But in the bright future I expect number of failures to decrease considerably, hence we would arrive to situation with a lean tree, while carrying burden of all prior runs. So we do need a solution to reduce such a burden. Potential alternatives/aspects I see :
    • use git-annex for the logs, then we store only symlinks, and can offload/drop old content in the future. Cons: cannot upload to github, would need to push to smth like datasets.datalad.org
    • use git notes mechanism.
    • some other ways to (ab)use git-objects and store in git while not creating linear history but then going from some common commit (or orphans) and tagged. I don't know if git then anyhow would diff those objects internally, possibly leading to more storage consumption?
    • do create history but then eventually discard it periodically (kinda what git-annex forget does). With code separation it should be feasible.
    • overall: your call... I feel that we might get away with aforementioned rewrite alternative

@jwodder
Copy link
Member

jwodder commented Jul 19, 2024

@yarikoptic

  • So should stdout/stderr only be saved for failed tests or for all tests that produce something nonempty?
  • Should stdout and stderr be collected as separate streams or as one stream?
  • Other than stdout & stderr, should the log files contain any other information about the tests that produced them? (Cf. the fields mentioned in Proposal: Store a summary of every test run in a dedicated log file #82)
  • Do you want the status.yaml files moved to the new logs repo or not? If yes, how exactly should this repo be laid out? I'm unclear on what the submodule/{dandiset_id}/ path is you're contemplating having logs/ symlinked to.
  • Should the generated README.md file be moved to the new logs repo?
  • When this is implemented, can I delete all the old results/{dandiset_id}/*.log files currently in the repository on drogon?

@jwodder
Copy link
Member

jwodder commented Jul 25, 2024

@yarikoptic Ping.

1 similar comment
@jwodder
Copy link
Member

jwodder commented Jul 29, 2024

@yarikoptic Ping.

@yarikoptic
Copy link
Member Author

  • So should stdout/stderr only be saved for failed tests or for all tests that produce something nonempty?

Let's store for all for starters -- might be worth comparing between successful and failing runs

  • Should stdout and stderr be collected as separate streams or as one stream?

I feel we better separate.

could we use some field as test_id to bind together the two? if not - let's add uuid of some kind to those records?

  • Do you want the status.yaml files moved to the new logs repo or not? If yes, how exactly should this repo be laid out? I'm unclear on what the submodule/{dandiset_id}/ path is you're contemplating having logs/ symlinked to.

status.yaml would stay in this repo as/where it is now. Then let's create outputs/ top level folder pointing to submodule for https://github.com/dandi/dandisets-healthstatus-outputs, which would have {dandiset_id}/{test_id}.stdout and {dandiset_id}/{test_id}.stderr files? I am open to suggestions

  • Should the generated README.md file be moved to the new logs repo?

no, keep it here

  • When this is implemented, can I delete all the old results/{dandiset_id}/*.log files currently in the repository on drogon?

I guess so

@jwodder
Copy link
Member

jwodder commented Jul 30, 2024

@yarikoptic

could we use some field as test_id to bind together the two? if not - let's add uuid of some kind to those records?

There's no single field in the issue 82 records (yet) that uniquely identifies each record, so adding a unique ID field (whether UUID or some other format) is the only option.

Then let's create outputs/ top level folder pointing to submodule for https://github.com/dandi/dandisets-healthstatus-outputs

Why the name change from https://github.com/dandi/dandisets-healthstatus-logs?

which would have {dandiset_id}/{test_id}.stdout and {dandiset_id}/{test_id}.stderr files?

I'm not a fan of that file path format. I'd prefer sticking to the initial idea of {dandiset_id}/{asset_path}/{test_name}.log and having each .log file contain the test ID, stdout, and stderr in some structured format. This would make it easy to replace the logs for an asset & test with those for a later run of the same asset & test.

@jwodder
Copy link
Member

jwodder commented Aug 5, 2024

@yarikoptic Ping on responses to above.

@yarikoptic
Copy link
Member Author

Ok, let's keep logs/ and https://github.com/dandi/dandisets-healthstatus-logs and your path format.

@jwodder
Copy link
Member

jwodder commented Aug 5, 2024

@yarikoptic So, to summarize:

  • A logs/ submodule will be added to this repository, pointing to https://github.com/dandi/dandisets-healthstatus-logs

  • dandisets-healthstatus will capture the stdout & stderr (as separate streams) of test commands.

  • The event records for Proposal: Store a summary of every test run in a dedicated log file #82 will include a unique ID for each record (hereafter referred to as the "test ID").

  • Whenever a test command's stdout and/or stderr is nonempty (even if the test passed), the captured streams — along with the test ID — will be written to the path {dandiset_id}/{asset_path}/{test_name}/outputs.yaml within the logs/ submodule.

    • Also include test status/outcome and environment ID?
  • Whenever a test command's stdout and stderr are both empty, the file at the above path will be deleted if it exists.

  • At the end of a check or test-files --save-results run, all changes made to the logs/ submodule will be committed & pushed.

  • After implementing this, delete all the old results/{dandiset_id}/*.log files currently in the repository on drogon.

Is that correct & complete?

@yarikoptic
Copy link
Member Author

Uff, starts to become tricky to keep track of discussion(s) in multiple issues etc. Next time we better do it as some design doc PR, since then it is easier to polish particular aspect(s) with suggestions etc. Overall it sounds correct but I would like to bring it together with discussion of #82 for individual records since it should be coherent, and both "outputs" (logs) and individual test run logs (records) IMHO would be what would require eventually trimming and possibly "forgetting" git history to reduce the size. So I think we should keep them both in a separate repository and thus could keep them "nearby" each other. This repository would remain "lightweight summary".

Using convention of // to be submodule (as we use in con/tinuous and datalad addurls etc):

  • environments.yaml - detailed description of environments used for testing as described in this thread with keys like 20240725-092554_hdmf-3.14.2_matnwb-2.6.0.2_pynwb-2.8.1 where date would come first whenever new unique env got detected
  • results/{dandiset_id}/status.yaml - the summary over assets for the dandiset. Produced out from logs//{dandiset_id}/*/*/records.log
  • logs//{dandiset_id}/{asset_path}/{test_name}/records.jsonl - individual run records of Proposal: Store a summary of every test run in a dedicated log file #82. Probably jsonl is the best to ease extending the file (let's forget "per year" for now, we could later filter using a tool during" forgetting")
  • logs//{dandiset_id}/{asset_path}/{test_name}/stdout.log (and stderr.log) - captured stdout/stderr if any. I do not see a point of using any serialization like yaml/json for those. Hopefully eventually it all would get silent and we would have none.

The behaviors you described above otherwise I think are good.

@jwodder
Copy link
Member

jwodder commented Aug 6, 2024

@yarikoptic See #88 for my thoughts on multiple repositories.

  • logs//{dandiset_id}/{asset_path}/{test_name}/stdout.log (and stderr.log) - captured stdout/stderr if any. I do not see a point of using any serialization like yaml/json for those.

The test ID still needs to be stored somewhere. This was why I suggested storing all three values (test ID, stdout, and stderr) in a single YAML file.

@yarikoptic
Copy link
Member Author

eh, if we didn't care about number of files/inodes, then we could add one more file, but indeed probably let's then just have a yaml outputs.yaml there. Although since already a .yaml, might even be worth adding that test run information (status and environment) there as well I guess.

@jwodder
Copy link
Member

jwodder commented Aug 6, 2024

@yarikoptic

In #82 (comment), you requested that an "untested" event be emitted whenever a new asset was found in a Dandiset. Is there supposed to be only one such event per asset — in which case, how would that fit in with the above path format? — or do you want one event for each test?

There's also the fact that non-NWB assets are currently "tested" by running file on them and getting their size (but only for --mode all; the other modes skip such assets), and the closest thing this pseudo-test has to a name is "untested", as that's the key used for the results in status.yaml files.

@yarikoptic
Copy link
Member Author

yarikoptic commented Aug 6, 2024

be kinda lame to populate with multiple "untested" events.
hm, no immediate idea on how/where depict that we have such (non nwb) files "untested". WDYT?

@jwodder
Copy link
Member

jwodder commented Aug 8, 2024

@yarikoptic

be kinda lame to populate with multiple "untested" events.
hm, no immediate idea on how/where depict that we have such (non nwb) files "untested". WDYT?

If you're asking about the "untested" pseudo-test run on non-NWB assets, I don't see a problem with logging the file output and size to logs//{dandiset_id}/{asset_path}/{test_name}/records.jsonl like a normal test.

If you're instead asking about the "untested" status for new assets, both NWB and non-NWB (which should really be renamed to "new", possibly alongside renaming "untested"), one approach would be to log them to a Dandiset-wide record file at logs//{dandiset_id}/.dandi/records.jsonl.

@yarikoptic
Copy link
Member Author

my concern was more of the latter. Indeed probably would not make sense for logs//{dandiset_id}/{asset_path}/records.jsonl to just say that it was not tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants