Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: naming of modules #142

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

maxdinkel
Copy link
Member

@maxdinkel maxdinkel commented Feb 28, 2025

Description and Context

Currently, the cluster pipeline is failing because with import pickle in queens.utils.remote_operations.py the queens.utils.pickle.py is imported instead of the standard library. This bug was introduced with #136 (my bad 👼 ).
This PR refactors the pickle.py utils into the io.py utils to solve this issue.

Related Issues and Pull Requests

How Has This Been Tested?

Cluster pipeline

Checklist

  • My commit messages mention the appropriate issue numbers (advised but optional).
  • I mention the appropriate issue numbers in this PR.
  • I updated documentation where necessary.
  • I have added tests to cover my changes.

Additional Information

Interested Parties

Possible reviewers:

Other interested parties:

@maxdinkel maxdinkel requested a review from gilrrei February 28, 2025 08:04
@maxdinkel maxdinkel self-assigned this Feb 28, 2025
@maxdinkel maxdinkel added high priority High priority issue minor: refactor Minor refactor to circumvent the 24h rule labels Feb 28, 2025
leahaeusel
leahaeusel previously approved these changes Feb 28, 2025
@maxdinkel
Copy link
Member Author

maxdinkel commented Feb 28, 2025

I guess we have the same problem with numpy.py utils (see pipeline).
tensorflow.py and gpflow.py could also be problematic in the future. Any suggestions how to resolve that?

gilrrei
gilrrei previously approved these changes Feb 28, 2025
@gilrrei
Copy link
Member

gilrrei commented Feb 28, 2025

I guess we have the same problem with numpy.py utils (see pipeline). tensorflow.py and gpflow.py could also be problematic in the future. Any suggestions how to resolve that?

What about adding something like _utils, _wrapper or _queens for these cases? I know it breaks the convention. However, I personally feel that it is more than justified, usability and clarity always comes first IMO

@leahaeusel
Copy link
Member

I guess we have the same problem with numpy.py utils (see pipeline). tensorflow.py and gpflow.py could also be problematic in the future. Any suggestions how to resolve that?

What about adding something like _utils, _wrapper or _queens for these cases? I know it breaks the convention. However, I personally feel that it is more than justified, usability and clarity always comes first IMO

As ugly as it is, I agree that we need to break the convention in such cases. I would prefer to simply append the package name, here _utils. We should run into the same problem with our Numpy data processor, right?

@maxdinkel maxdinkel dismissed stale reviews from gilrrei and leahaeusel via dc03ab3 February 28, 2025 09:26
@leahaeusel
Copy link
Member

iterators/pymc.py, schedulers/dask.py, and utils/pymc.py are also problematic, although I would argue that the pymc iterator should be renamed anyway such that the new name includes the actual method, e.g. mcmc_pymc.py

@maxdinkel
Copy link
Member Author

I am starting to think that we should name all modules with a leading underscore. The user should import the modules through the __init__.py anyway and will not notice those leading underscores.

@leahaeusel
Copy link
Member

I am starting to think that we should name all modules with a leading underscore. The user should import the modules through the __init__.py anyway and will not notice those leading underscores.

At first glance, this doesn't sound like a bad idea to me. Are there any unwanted side effects we need to take into account? Nevertheless, this wouldn't work for the utils modules.

@maxdinkel
Copy link
Member Author

I am starting to think that we should name all modules with a leading underscore. The user should import the modules through the __init__.py anyway and will not notice those leading underscores.

At first glance, this doesn't sound like a bad idea to me. Are there any unwanted side effects we need to take into account? Nevertheless, this wouldn't work for the utils modules.

Why would it not work for the utils modules?

@leahaeusel
Copy link
Member

At first glance, this doesn't sound like a bad idea to me. Are there any unwanted side effects we need to take into account? Nevertheless, this wouldn't work for the utils modules.

Why would it not work for the utils modules?

Because they are not imported through the __init__.py. Of course, you could still import them via queens.utils._numpy.py but the leading underscore usually indicates that these files are only for internal use and shouldn't be imported by the user, which would be misleading in this case.

@maxdinkel
Copy link
Member Author

At first glance, this doesn't sound like a bad idea to me. Are there any unwanted side effects we need to take into account? Nevertheless, this wouldn't work for the utils modules.

Why would it not work for the utils modules?

Because they are not imported through the __init__.py. Of course, you could still import them via queens.utils._numpy.py but the leading underscore usually indicates that these files are only for internal use and shouldn't be imported by the user, which would be misleading in this case.

The idea is to import the modules as import _numpy as numpy in the __init__.py. Then the user can import the util module as import queens.utils.numpy as numpy_utils. And still, when we import numpy, for example, in queens.utils.classifier.py, this will import the numpy package and not our util package.

@leahaeusel
Copy link
Member

The idea is to import the modules as import _numpy as numpy in the __init__.py. Then the user can import the util module as import queens.utils.numpy as numpy_utils. And still, when we import numpy, for example, in queens.utils.classifier.py, this will import the numpy package and not our util package.

You're completely right, I didn't think of that. Sorry for the confusion.

Just one last concern: Will the documentation display the Module names without the underscore? From a usability perspective, the documentation should be sufficient to tell the user how to import each module without having to look at the __init__.py, in my opinion.

@gilrrei
Copy link
Member

gilrrei commented Feb 28, 2025

As decided upon in a personal meeting:

  • numpy dp -> numpy_file, utils -> linalg, array
  • csv dp -> csv_file
  • tensorflow utils -> configure_tensorflow
  • gpflow utils -> gp_flow
  • dask schelduer -> _dask
  • pymc iterator, utils -> _pymc, move utils to _pymc
  • All base classes to _

@maxdinkel maxdinkel force-pushed the quickfix-refactor-pickle-utils branch from dc03ab3 to 552b9d1 Compare February 28, 2025 15:46
@maxdinkel maxdinkel changed the title refactor: pickle utils refactor: naming of modules Feb 28, 2025
@maxdinkel
Copy link
Member Author

Cluster pipeline is fixed.

@@ -17,7 +17,7 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this file to test_numpy_array.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority High priority issue minor: refactor Minor refactor to circumvent the 24h rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants