Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First major ROLLO PR with source code, tests, and one example problem #6

Merged
merged 434 commits into from
Jun 1, 2021

Conversation

gwenchee
Copy link
Collaborator

This PR is huge because it contains all the source code and tests for v1.0 of ROLLO.

PR overview:
/docs/: barebones documentation (I will update this to provide examples and more explanation in a future PR)
/examples/: a single FHR slab ROLLO optimization problem
/rollo/: ROLLO's source code
/tests/: ROLLO unit test files

Seeing that this PR is so large, perhaps different people may review different parts of the PR.

Here's a visualization and table describing ROLLO's architecture:
Screen Shot 2021-05-19 at 11 14 40 AM
Screen Shot 2021-05-19 at 11 15 13 AM

@abachma2
Copy link
Contributor

I'll take number 3

Copy link
Contributor

@abachma2 abachma2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part 3 files look good to me. Some comments to make things a little more clear and readable. Nice work!

@gwenchee
Copy link
Collaborator Author

Thanks for the review @abachma2, I have addressed your comments.

@gwenchee gwenchee requested a review from abachma2 May 19, 2021 20:55
Copy link
Contributor

@abachma2 abachma2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes! Looks great Gwen!

@gwenchee
Copy link
Collaborator Author

I deleted the /docs/ directory from this branch and will make a separate PR for documentation. There are no other changes.

Copy link
Contributor

@smpark7 smpark7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my review. Looks good and mostly clean so far

gwenchee and others added 2 commits May 21, 2021 12:28
Co-authored-by: Sun Myung Park <park_cg@live.com>
@gwenchee gwenchee requested a review from smpark7 May 21, 2021 20:37
@gwenchee
Copy link
Collaborator Author

Thanks @smpark7!

Copy link
Contributor

@samgdotson samgdotson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run the examples yet. I thought the names of some of the classes was kind of confusing. The hierarchy and the relationship between the classes is not obvious, to me. It's also not obvious what some of your tests are doing. There are a lot of integration tests and not a lot of unit tests.
What happens if someone writes an operator in a slighty different way than is present in the dictionary? This kind of error isn't handled and could become a headache for users.

You've obviously put a lot of thought and effort into this and I don't want to get in the way of the first big release so I'm not requesting changes. However, some of the items I mentioned could become separate issues that could be addressed later.

from mpi4py.futures import MPICommExecutor
except:
warnings.warn(
"Failed to import mpi4py. (Only important for parallel method: mpi_evals)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Failed to import mpi4py. (Only important for parallel method: mpi_evals)"
"Failed to import mpi4py. Multithreading disabled."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure my suggestion makes me happy, either. Basically, the warning could be more descriptive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 2 methods for parallelization that I will outline in the docs: multiprocessing and mpi_evals. mpi_evals is a unique implementation that is helpful when using a supercomputer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then there are two tasks that need to be done:

  1. Add an explanation differentiating two types of parallelization in the docs.
  2. Make the warning clearer.

Maybe add "reverting to multiprocessing" or something to that effect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup 1. is definitely being done. Yeah I agree about 2), let me make an issue for it: #9

)


class Algorithm(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Algorithm really the best name for this? This sounds like a container for simulation info. Maybe Simulation_Info or something similar, might be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this runs (some might say evaluates... or executes... ) the algorithm, what is the purpose of the Evaluation class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #6 (comment) to understand why I have an Executor class, an Algorithm class, and also an Evaluation class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the documentation, then. Reading the docstrings from each class does not give a sense for the overall organization of the rollo package.

class Evaluation:    
    """
    Holds functions that generate and execute the evaluation solver's scripts.

    ..."""

class Algorithm(object):    
    """
    Builds and runs Genetic Algorithms.    
    This class holds functions to generate a generic genetic algorithm.

    ..."""

class Executor(object):    
    """
    Executes rollo simulation from start to finish.    
    Instances of this class can be used to perform a rollo run.
    ..."""

I'm not trying to be pedantic about this. It really isn't clear to me from reading the docstrings what each class should be used for. You have great explanations in a LaTeX table in your comment #6, those explanations should be inside the doc strings for your classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay i hear you on that, I'll add more description for those classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added more descriptions. I appreciate the pedantic-ness, it helps keep me accountable with my documentation. :)

os.remove("checkpoint.pkl")


def test_apply_mating_operator():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be docstrings explaining what the test does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every unit test is named after the function in the source code that it tests. So, I think it is clear what it is doing. For example, the function test_apply_mating_operator in test_algorithm.py tests the function apply_mating_operator in algorithm.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For which cases? Only one case? The docstring doesn't need to be long, just enough to indicate what is being tested and what case it covers. For example:

def test_grid_optimize_1():    
"""    Case 1: Too many variables given    
"""    
    with pytest.raises(AssertionError):        
        loss = grid_optimizer(X_in.T,                              
        params,                              
        args=['trainlen', 'rho', 'noise'],                              
        xset=trainingLengths,                              
        verbose=False,                              
        visualize=False)
    return

Of course, writing tests takes time. I don't want to get in the way of progress so you may consider creating an issue for this. At the very least, you could number each test you have test_func_1 and add a docstring as shown in my example s.t. future tests can be added so that it's obvious what each test does and what edge/corner cases still need to be covered.
I think it's fine to not have a complete suite of tests in this first major PR. In fact, it provides a great opportunity for undergrads to practice writing tests. This has three benefits:

  1. Undergrads get experience coding
  2. You get a more robust software package
  3. Rollo gets an expanded user base

In sum: Number your tests, add docstrings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share the resource you used that suggests that tests should be numbered? I've not come across this style of test naming (I've been referring to openmc and numpy). Sure, I can make an issue to expand testing: #10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at my own test doc string, I can see that it could be clearer also!
"Which variables? Oh, the 'args' parameter."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbering is probably just a "me" thing. Often a test will be something like: test_func_case followed by a docstring elaborating on the case. I couldn't come up with succinct enough test names so I tend to just number them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, in that case, I think i'm comfortable with my naming convention since it pairs with the function in the source file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But perhaps, I'll consider expanding to numbering as I or an undergrad addresses #10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Sam's suggestion is sensible. If in the future you wanted to add more tests, you just add them, without changing the names. Also, in some cases, you could have interior, edge, and cases (see Katy's book). Having that either in the name of the function, with numbers, or the docstrings would be helpful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also a good idea to keep everything consistent. If you don't have any docstring for your tests, keep it that way, because changing it at this point might be too cumbersome. Also you could have an undergrad to do that, add docstrings to all the tests.

from deap import base, creator


def test_eval_fn_generator():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing as before. What does this test do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

import time


class BackEnd(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this class and the Algorithm class seems to have some overlap.

It's not totally clear what the distinction is between these two classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #6 (comment) to understand why I have an Algorithm class, and BackEnd Class

Comment on lines +36 to +41
"<": operator.lt,
"<=": operator.le,
"==": operator.eq,
"!=": operator.ne,
">=": operator.ge,
">": operator.gt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if someone accidentally enters != rather than !=? Is there a test for this (with an errant space)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, you have some tests for this as well. Although, my comment regarding cases and docstrings still applies.

@gwenchee
Copy link
Collaborator Author

Thanks for the review @samgdotson.

I haven't run the examples yet. I thought the names of some of the classes was kind of confusing. The hierarchy and the relationship between the classes is not obvious, to me.

An overview of the class relationships are outlined in the comment I included at the top of this PR: #6 (comment)

It's also not obvious what some of your tests are doing. There are a lot of integration tests and not a lot of unit tests.

Every single test is actually a unit test. As mentioned in the above comments, every unit test is named after the function in the source code that it tests. For example, the function test_apply_mating_operator in test_algorithm.py tests the function apply_mating_operator in algorithm.py.

@gwenchee gwenchee requested a review from samgdotson May 26, 2021 02:28
Copy link
Contributor

@samgdotson samgdotson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the strucure and design I can tell a lot of thought went into this project. It is well done. The changes I'm requesting are primarily for clarity.

  1. Class docstrings should have enough information to differentiate tasks among them.
  2. Tests should have a doc string and a case number.
  3. There should be more tests. I would be satisfied with an open issue on this point.

@gwenchee
Copy link
Collaborator Author

Thanks for the review @samgdotson .

  1. I've addressed this with updated docstrings.
  2. As for this one, could you share the resource you used that suggests that tests should be numbered? I've not come across this style of test naming (I've been referring to openmc and numpy).
  3. See Expand unit testing to cover more cases.  #10

@gwenchee gwenchee requested a review from samgdotson May 28, 2021 16:21
@robfairh robfairh removed their request for review May 28, 2021 16:24
@gwenchee
Copy link
Collaborator Author

If it's all good, can we merge? @samgdotson

@gwenchee
Copy link
Collaborator Author

gwenchee commented Jun 1, 2021

Hi @samgdotson @smpark7 @abachma2 could someone merge this?

@samgdotson samgdotson merged commit 896a996 into arfc:main Jun 1, 2021
@gwenchee gwenchee mentioned this pull request Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Comp:Build This issue has to do with the build system of the code/doc (makefiles, cmake, install errors) Comp:Core This issue has to do with the main bulk of the code or document. (methods, main content) Comp:Input This issue has to do with the input component of the code or document. (input parameters, prep) Comp:Output This issue has to do with the output component of the code or document. (writing to databases, etc.) Difficulty:2-Challenging This issue may be complex or require specialized skills. Priority:2-Normal This work is important and should be completed ASAP.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants