-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatched dumped taskdoc and database state #231
Comments
|
Hi @cote3804, this is a somewhat unexpected error. In fact, to avoid any pointless serialization/deserialization, the content of the
This could be a good place to check if the information that is being inserted actually contains the new field. As an additional test, you can also set the
remote_job_data.json corresponds to the correct one from the worker.
|
Hi @gpetretto Thanks for the explanation of where to look in the code and how to store the documents on the runner machine. After some debugging by running locally on the worker, it turns out the atomate2 code was the issue. After loading in the json there was some additional post-processing that wasn't storing the new field in the doc that was uploaded to the database. I was under the impression that the Thanks for the help! |
Hi @cote3804, good to hear that you managed to fix your issue and thanks for reporting the results of your investigation. Still, my intention was to avoid passing thorugh the processes of (de)serialization as much as possible, but then it does not seem to be like this. Based on your tests, can you provide me more details about the point where atomate2 code was invoked during the completion of the job? Maybe the process can be optimized compared to how it is now. |
When I was testing on the worker machine I would use Is that what you wanted? I'm not sure I fully understand your question. Part of my confusion is that I don't understand where jobflow-remote is calling the atomate2 code. I assumed it was all on the remote worker and that once the remote_job_data.json was moved back to the runner machine, it would be directly uploaded to the database. |
Hi @cote3804, sorry for the delay and for not being very clear in the previous message.
did you manage to locate the part of the code in jobflow-remote where this additional post-processing is invoked? |
Hi @gpetretto Coming back to this after a few days I've realized that the issue is fully resolved and there's no unexpected post-processing being done by jobflow-remote on the Unrelatedly, is there an open issue or discussion on batching of jobs on one Thanks! |
Good to hear the issue is solved. Coming to your other question: yes, there is a way to run several jobflow Jobs in a single slurm job. In the released version it is only possible to run multiple jobflow jobs sequentially in a single slurm job. In the current develop branch it is also possible to run multiple jobflow jobs in parallel at the same time (on top of the sequential execution). If you are going to use that I would suggest using the development branch, because I also fixed a couple of bugs related to the batch submission. The updated documentation is here: https://github.com/Matgenix/jobflow-remote/blob/develop/doc/source/user/advancedoptions.rst#batch-submission In case you are using the 0.1.4 version and switch to the development branch, you will likely encounter an issue with the upgrade procedure and here is the explanation of how to deal with it: #211 (comment) |
Wow you all have built quite an excellent tool! I'll happily provide feedback as I use it. In terms of specifying resources for individual jobs being run in parallel in a batch (i.e. specifying |
Jobflow-remote does not have direct control on how the jobs are being executed. I think it could be possible to set the number of cores/nodes available in a place accessible from inside the jobflow Job, but this means that the Job implementation would be strictly bound to jobflow-remote as a job manager. exec_config:
multi_conf_1n_24p:
modules:
- ...
export:
atomate2_VASP_CMD: '"srun --nodes 1 -n 24 --exclusive vasp_std"' I suppose you could do the same for FHI-AIMS. |
You've proposed a good solution. Using exec_config:
multi_conf_1n_24p:
modules:
- ...
export:
ATOMATE2_VASP_CMD: '"srun --nodes 1 -n 24 --exclusive vasp_std"' Regarding case sensitivity: Based on the following source code snippet: werner@x13dai-t:~/Public/repo/github.com/materialsproject/atomate2.git$ cd "/home/werner/Public/repo/github.com/materialsproject/atomate2.git" && ug -I -A5 -B5 _ENV_PREFIX
src/atomate2/settings.py
8-
9- from pydantic import Field, model_validator
10- from pydantic_settings import BaseSettings, SettingsConfigDict
11-
12- _DEFAULT_CONFIG_FILE_PATH = "~/.atomate2.yaml"
13: _ENV_PREFIX = "atomate2_"
14-
15-
16- class Atomate2Settings(BaseSettings):
17- """
18- Settings for atomate2.
--
213- )
214- ABINIT_MAX_RESTARTS: int = Field(
215- 5, description="Maximum number of restarts of a job."
216- )
217-
218: model_config = SettingsConfigDict(env_prefix=_ENV_PREFIX)
219-
220- # QChem specific settings
221-
222- QCHEM_CMD: str = Field(
223- "qchem", description="Command to run standard version of qchem."
--
254- This allows setting of the config file path through environment variables.
255- """
256- from monty.serialization import loadfn
257-
258- config_file_path = values.get(key := "CONFIG_FILE", _DEFAULT_CONFIG_FILE_PATH)
259: env_var_name = f"{_ENV_PREFIX.upper()}{key}"
260- config_file_path = Path(config_file_path).expanduser()
261-
262- new_values = {}
263- if config_file_path.exists():
264- if config_file_path.stat().st_size == 0:
tests/common/test_settings.py
1- from pathlib import Path
2-
3- import pytest
4- from pydantic import ValidationError
5-
6: from atomate2.settings import _DEFAULT_CONFIG_FILE_PATH, _ENV_PREFIX, Atomate2Settings
7-
8-
9- def test_empty_and_invalid_config_file(
10- clean_dir, monkeypatch: pytest.MonkeyPatch, capsys: pytest.CaptureFixture[str]
11- ):
12- # test no warning if config path is default and file does not exist
13: env_var_name = f"{_ENV_PREFIX.upper()}CONFIG_FILE"
14- monkeypatch.delenv(env_var_name, raising=False)
15- settings = Atomate2Settings()
16- assert settings.CONFIG_FILE == _DEFAULT_CONFIG_FILE_PATH
17- stdout, stderr = capsys.readouterr()
18- assert stdout == "" We can see and summarize that:
This approach maintains compatibility with atomate2 while allowing flexible resource configuration in jobflow-remote. See here and here for the related documentation and discussion. |
Based on https://github.com/materialsproject/atomate2/blob/main/tests/common/test_settings.py, I gave a more comprehensive test, as shown below: (datasci) werner@x13dai-t:~/test$ cat test_atomate2_settings.py
from pathlib import Path
import pytest
from pydantic import ValidationError
from atomate2.settings import (
_DEFAULT_CONFIG_FILE_PATH,
_ENV_PREFIX,
Atomate2Settings,
)
def test_empty_and_invalid_config_file(
clean_dir, monkeypatch: pytest.MonkeyPatch, capsys: pytest.CaptureFixture[str]
):
# test no warning if config path is default and file does not exist
env_var_name = f"{_ENV_PREFIX.upper()}CONFIG_FILE"
monkeypatch.delenv(env_var_name, raising=False)
settings = Atomate2Settings()
assert settings.CONFIG_FILE == _DEFAULT_CONFIG_FILE_PATH
stdout, stderr = capsys.readouterr()
assert stdout == ""
assert stderr == ""
# set path to load settings from with ATOMATE2_CONFIG_FILE env variable
config_file_path = Path.cwd() / "test-atomate2-config.yaml"
monkeypatch.setenv(env_var_name, str(config_file_path))
settings = Atomate2Settings()
assert str(config_file_path) == settings.CONFIG_FILE
assert settings.SYMPREC == 0.1
assert settings.BANDGAP_TOL == 1e-4
assert settings.VASP_RUN_BADER is False
assert settings.VASP_RUN_DDEC6 is False
assert settings.DDEC6_ATOMIC_DENSITIES_DIR is None
# test warning if config file exists but is empty
config_file_path.touch()
with pytest.warns(UserWarning, match=f"Using {env_var_name} at .+ but it's empty"):
Atomate2Settings()
# test error if the file exists and contains invalid YAML
with open(config_file_path, "w") as file:
file.write("invalid yaml")
with pytest.raises(SyntaxError, match=f"{env_var_name} at"):
Atomate2Settings()
# test error if the file exists and contains invalid settings
with open(config_file_path, "w") as file:
file.write("VASP_CMD: 42")
with pytest.raises(
ValidationError,
match="1 validation error for Atomate2Settings\nVASP_CMD\n "
"Input should be a valid string ",
):
Atomate2Settings()
# another invalid setting
with open(config_file_path, "w") as file:
file.write("BANDGAP_TOL: invalid")
with pytest.raises(
ValidationError,
match="1 validation error for Atomate2Settings\nBANDGAP_TOL\n "
"Input should be a valid number",
):
Atomate2Settings()
# test warning if config path is non-default and file does not exist
config_file_path.unlink()
with pytest.warns(UserWarning, match=f"{env_var_name} at .+ does not exist"):
Atomate2Settings()
@pytest.fixture
def clean_dir(tmp_path):
"""Provide a clean directory for testing."""
from pathlib import Path
import os
original_dir = Path.cwd()
os.chdir(tmp_path)
yield tmp_path
os.chdir(original_dir)
@pytest.fixture
def config_file(tmp_path):
config = tmp_path / "test-atomate2-config.yaml"
config.write_text("""
VASP_CMD: vasp_from_config
SYMPREC: 0.2
BANDGAP_TOL: 0.002
VASP_RUN_BADER: false
CP2K_CMD: cp2k_from_config
""")
return config
def test_environment_variables_override_config_file(monkeypatch, config_file):
env_var_name = f"{_ENV_PREFIX.upper()}CONFIG_FILE"
monkeypatch.setenv(env_var_name, str(config_file))
monkeypatch.setenv(f"{_ENV_PREFIX.upper()}VASP_CMD", "vasp_from_env")
monkeypatch.setenv(f"{_ENV_PREFIX.upper()}SYMPREC", "0.3")
monkeypatch.setenv(f"{_ENV_PREFIX.upper()}BANDGAP_TOL", "0.003")
monkeypatch.setenv(f"{_ENV_PREFIX.upper()}VASP_RUN_BADER", "true")
settings = Atomate2Settings()
# Check that the config file is being used
assert settings.CONFIG_FILE == str(config_file), "Config file path should be correctly set"
# Check priorities
assert settings.VASP_CMD == "vasp_from_env", "Environment variable should override config file"
assert settings.SYMPREC == 0.3, "Environment variable should override config file"
assert settings.BANDGAP_TOL == 0.003, "Environment variable should override config file"
assert settings.VASP_RUN_BADER is True, "Environment variable should override config file"
assert settings.CP2K_CMD == "cp2k_from_config", "Config file value should be used when no environment variable is set"
def test_default_values_when_not_set(monkeypatch, tmp_path):
empty_config = tmp_path / "empty_config.yaml"
empty_config.touch()
env_var_name = f"{_ENV_PREFIX.upper()}CONFIG_FILE"
monkeypatch.setenv(env_var_name, str(empty_config))
settings = Atomate2Settings()
# Check that the empty config file is being used
assert settings.CONFIG_FILE == str(empty_config), "Empty config file path should be correctly set"
# Check default values
assert settings.VASP_CMD == "vasp_std", "Should use default value when not set in config or environment"
assert settings.SYMPREC == 0.1, "Should use default value when not set in config or environment"
assert settings.BANDGAP_TOL == 0.0001, "Should use default value when not set in config or environment"
assert settings.VASP_RUN_BADER is False, "Should use default value when not set in config or environment"
assert settings.CP2K_CMD == "cp2k.psmp", "Should use default value when not set in config or environment"
(datasci) werner@x13dai-t:~/test$ pytest test_atomate2_settings.py -v
======================================================================== test session starts =========================================================================
platform linux -- Python 3.11.1, pytest-8.3.3, pluggy-1.5.0 -- /home/werner/.pyenv/versions/3.11.1/envs/datasci/bin/python3
cachedir: .pytest_cache
rootdir: /home/werner/test
plugins: split-0.10.0, dash-2.16.1, nbmake-1.5.4, anyio-4.6.2.post1, cov-6.0.0, mock-3.14.0
collected 3 items
test_atomate2_settings.py::test_empty_and_invalid_config_file PASSED [ 33%]
test_atomate2_settings.py::test_environment_variables_override_config_file PASSED [ 66%]
test_atomate2_settings.py::test_default_values_when_not_set PASSED [100%]
========================================================================== warnings summary ==========================================================================
test_atomate2_settings.py::test_empty_and_invalid_config_file
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/pydantic/main.py:214: UserWarning: ATOMATE2_CONFIG_FILE at /tmp/pytest-of-werner/pytest-4/test_empty_and_invalid_config_0/test-atomate2-config.yaml does not exist
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
test_atomate2_settings.py::test_default_values_when_not_set
/home/werner/.pyenv/versions/3.11.1/envs/datasci/lib/python3.11/site-packages/pydantic/main.py:214: UserWarning: Using ATOMATE2_CONFIG_FILE at /tmp/pytest-of-werner/pytest-4/test_default_values_when_not_s0/empty_config.yaml but it's empty
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================== 3 passed, 2 warnings in 0.29s ==================================================================== |
Thanks for the help @hongyi-zhao and @gpetretto. Configuring the srun arguments with the I've been testing batch submission and have come across an issue where jobs in the I was able to copy the files in the Is there a better way to handle file cleanup and resetting the state of a timed out job or are these features under development right now? This discussion is now quite far from the title of the post so I can open a new issue to discuss if you all think that's better. |
Hi @cote3804, thanks a lot for the feedback on the batch submission!
You mean that the SLURM job reached the walltime before the jobflow Job reached the end? Also, it is not clear why, if there was no Job in the If despite fixing the bug situations like the one that you mention keep arising I can provide a to move those files, but maybe this should could be handled in Anyway, indeed it would be better to move this discussion to a dedicated issue. Can you open a new one on this topic? |
I haven't been able to reproduce this issue and all of the file operations I did when I reported this have since been overwritten by normal operation of the code. If it arises again (and I suspect it will) I'll open a new issue! Thanks |
Hi Devs,
I'm working on my own pymatgen and atomate2 forks to implement additional parsed fields for FHI-AIMS calculations. I added an additional field to the aims CalculationOutput.
I can see in the remote directory created on the worker that the value is parse correctly and written to the
remote_job_data.json
file.remote_job_data.json
However, when I query the database, that particular field is not present. The states of both pymatgen and atomate2 are consistent on the runner and worker machines. Is there a good way to debug this issue? I assume the error is happening during some runner communication and I'm not quite sure how to go about analyzing the problem.
Thanks!
The text was updated successfully, but these errors were encountered: