-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New atom selection mechanism #672
Conversation
…to maciej/simplify-atom-selection
…to maciej/simplify-atom-selection
…to maciej/simplify-atom-selection
Looks good. Just one thing about the SMARTS pattern matching. In the RDKit molecule object we use the Chem.rdchem.BondType.UNSPECIFIED for all of the bondtypes. This means that most SMARTS patterns you'd think should work won't and that is why the SMARTS patterns we use are a bit unusual. The correct thing would be to set all the correct bond types and atom properties but I think this might get difficult to get working for a general molecule e.g. setting single/double/aromatic bonds for a benzene or cyclooctatetraene or some big protein. |
VALID_SELECTION = "Valid selection" | ||
USELESS_SELECTION = "Selection did not change. This operation is not needed." | ||
MALFORMED_SELECTION = "This is not a valid JSON string." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making these an Enum
if self.mode_box.currentIndex() == 2: | ||
return "difference" | ||
elif self.mode_box.currentIndex() == 1: | ||
return "intersection" | ||
else: | ||
return "union" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better as an Enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One complication here is that these values have to be JSON encoded and decoded (and remain human-readable when saved as a text string). I am sure that it can be done. Do you think it is worth the effort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually referring to the 1
, 2
...
If you had:
class Modes(Enum):
DIFFERENCE = auto()
INTERSECTION = auto()
UNION = auto()
...
if self.mode_box.currentIndex() == Modes.DIFFERENCE:
return "difference"
elif self.mode_box.currentIndex() == Modes.INTERSECTION:
return "intersection"
elif self.mode_box.currentIndex() == Modes.UNION:
return "union"
It removes magic numbers and ensures internal consistency a bit better if we ever add other operations.
As an aside, Enum
s can act as strings
self.selection_field.setPlaceholderText("0,1,2") | ||
self.selection_keyword = "index_list" | ||
self.selection_separator = "," | ||
if new_mode == "range": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also probably Enum
…to maciej/simplify-atom-selection
Co-authored-by: Jacob Wilkins <46597752+oerc0122@users.noreply.github.com> Signed-off-by: Maciej Bartkowiak <108934199+MBartkowiakSTFC@users.noreply.github.com>
…SISNeutronMuon/MDANSE into maciej/simplify-atom-selection
…to maciej/simplify-atom-selection
|
||
|
||
def select_atoms( | ||
trajectory: Trajectory, **function_parameters: Dict[str, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's currently recommended that for **kwargs
you document the types of any variables that they might contain.
https://peps.python.org/pep-0484/#arbitrary-argument-lists-and-default-argument-values
Set[int] | ||
Set of all the atom indices | ||
""" | ||
return set(range(len(trajectory.chemical_system.atom_list))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we want a utility function for set(range(len(...)))
or if there's a nicer way of writing it.
|
||
|
||
def select_all( | ||
trajectory: Trajectory, **function_parameters: Dict[str, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cases where the arguments are not used internally, it's recommended that they are prefixed with an _
trajectory : Trajectory | ||
A trajectory instance to which the selection is applied | ||
function_parameters : Dict[str, Any] | ||
should include a list of str molecule names under key "atom_labels" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are sensible options, they should be in the function signature, i.e.
def select_molecules(
trajectory: Trajectory, *,
molecule_names: Sequence[str] | None = None,
**_function_parameters: typing.Never
) -> Set[int]:
(The lone *
is to enforce molecule_names
should only be passed via kwargs).
self.operations = {} | ||
|
||
def set_selection( | ||
self, number: Union[int, None] = None, function_parameters: Dict[str, Any] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should function parameters be **
here?
@@ -137,9 +135,8 @@ def update_transmutation_textbox(self) -> None: | |||
map = self.transmuter.get_setting() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not you, but map
here is shadowing the python function.
Co-authored-by: Jacob Wilkins <46597752+oerc0122@users.noreply.github.com> Signed-off-by: Maciej Bartkowiak <108934199+MBartkowiakSTFC@users.noreply.github.com>
…to maciej/simplify-atom-selection
…SISNeutronMuon/MDANSE into maciej/simplify-atom-selection
index_list: Sequence[int] = None, | ||
index_range: Sequence[int] = None, | ||
index_slice: Sequence[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structly pep-484 doesn't allow implicit None
.
C.f. the end of https://peps.python.org/pep-0484/#union-types
""" | ||
selection = set() | ||
system = trajectory.chemical_system | ||
if rdkit_pattern: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have a default here?
set[int] | ||
The atoms indices. | ||
bool | ||
True if the selection adds atoms, False otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True if the selection adds atoms, False otherwise | |
True if the operation changes selection, False otherwise |
if operation_type == "union": | ||
selection = selection.union(temp_selection) | ||
elif operation_type == "intersection": | ||
selection = selection.intersection(temp_selection) | ||
elif operation_type == "difference": | ||
selection = selection.difference(temp_selection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be either
if operation_type == "union": | |
selection = selection.union(temp_selection) | |
elif operation_type == "intersection": | |
selection = selection.intersection(temp_selection) | |
elif operation_type == "difference": | |
selection = selection.difference(temp_selection) | |
if operation_type == "union": | |
selection |= temp_selection | |
elif operation_type == "intersection": | |
selection &= temp_selection | |
elif operation_type == "difference": | |
selection -= temp_selection |
Or the shorter, but slightly more esoteric
if operation_type == "union": | |
selection = selection.union(temp_selection) | |
elif operation_type == "intersection": | |
selection = selection.intersection(temp_selection) | |
elif operation_type == "difference": | |
selection = selection.difference(temp_selection) | |
if operation_type in ("union", "intersection", "difference"): | |
selection = getattr(selection, operation_type)(temp_selection) |
Returns | ||
------- | ||
Set[int] | ||
_description_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring
@@ -132,14 +134,11 @@ def get_information(self) -> str: | |||
|
|||
return "\n".join(info) + "\n" | |||
|
|||
def get_selector(self) -> Selector: | |||
def get_selector(self) -> "ReusableSelection": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I wrong in thinking this is imported?
def get_selector(self) -> "ReusableSelection": | |
def get_selector(self) -> ReusableSelection: |
class SelectionValidity(StrEnum): | ||
VALID_SELECTION = "Valid selection" | ||
USELESS_SELECTION = "Selection did not change. This operation is not needed." | ||
MALFORMED_SELECTION = "This is not a valid JSON string." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MALFORMED_SELECTION = "This is not a valid JSON string." | |
MALFORMED_SELECTION = "This is not a valid selection string." |
…to maciej/simplify-atom-selection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me if it does what it's meant to, just a couple of possible stylistic suggestions. Feel free to ignore if you feel they remove clarity.
if number is None: | ||
number = len(self.operations) | ||
else: | ||
number = int(number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not too long an option is:
if number is None: | |
number = len(self.operations) | |
else: | |
number = int(number) | |
number = int(number) if number is not None else len(self.operations) |
return bool( | ||
( | ||
len(selection.difference(current_selection)) > 0 | ||
and operation_type == "union" | ||
) | ||
or ( | ||
len(current_selection.difference(selection)) > 0 | ||
and operation_type != "union" | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give if len(Collection) > 0
is equivalent to if Collection
we can simplify this to:
return bool( | |
( | |
len(selection.difference(current_selection)) > 0 | |
and operation_type == "union" | |
) | |
or ( | |
len(current_selection.difference(selection)) > 0 | |
and operation_type != "union" | |
) | |
) | |
return ((selection - current_selection) and operation_type == "union") or | |
(current_selection - selection) and operation_type != "union")) |
if isinstance(v0, dict): | ||
self.set_selection(number=k0, function_parameters=v0) | ||
else: | ||
raise TypeError(f"Selection {v0} is not a dictionary.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(v0, dict): | |
self.set_selection(number=k0, function_parameters=v0) | |
else: | |
raise TypeError(f"Selection {v0} is not a dictionary.") | |
if not isinstance(v0, dict): | |
raise TypeError(f"Selection {v0} is not a dictionary.") | |
self.set_selection(number=k0, function_parameters=v0) |
if position_minimum is None: | ||
lower_limits = np.array(3 * [-np.inf]) | ||
else: | ||
lower_limits = np.array(position_minimum) | ||
if position_maximum is None: | ||
upper_limits = np.array(3 * [np.inf]) | ||
else: | ||
upper_limits = np.array(position_maximum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if position_minimum is None: | |
lower_limits = np.array(3 * [-np.inf]) | |
else: | |
lower_limits = np.array(position_minimum) | |
if position_maximum is None: | |
upper_limits = np.array(3 * [np.inf]) | |
else: | |
upper_limits = np.array(position_maximum) | |
lower_limits = (np.array(position_minimum) | |
if position_minimum is not None else | |
np.array([-np.inf] * 3)) | |
upper_limits = (np.array(position_maximum) | |
if position_maximum is not None else | |
np.array([np.inf] * 3)) |
dict[str, list[int]] | ||
For each atom type, a list of indices of selected atoms | ||
|
||
""" | ||
indicesPerElement = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be simplified to:
indicesPerElement = defaultdict(list)
for i, v in enumerate(self["names"]):
indicesPerElement[v].extend(self["indices"][i])
Some issues I've found and comments.
|
Fixed.
I could argue that it is an operation on all atoms, since
Now that the selection is a sequence of operations, it is less clear how the clicks should be handled. Two options I can think of:
This is weird and I have not found the reason for it.
Fixed.
Most likely I will need to subclass QListView to fix this. For now I will disable dragging.
I could make sure that the selection widget always starts with an explicit "select all" instead of being empty. Because at the moment empty selection is assumed to mean "select all", but the first selection replaces it. |
Since we are letting the user enter SMARTS patterns for matching, I think we need to try to tackle #286 at some point. Currently, we don't have bonding information, so users can only use "C~C" for two carbon atoms bonded using any bond ("~") and things like "c:c" (aromatic carbons joined by an aromatic bond) and "c-c" (aromatic carbons joined by a single bond (e.g. biphenyl)) currently do not work. |
OK, I think that I managed to improve matters regarding points 1, 4, 5, 6 and 7. |
Description of work
The intention behind this PR is to implement a mechanism of selecting atoms which is:
We can still discuss if the implementation in this PR meets the requirements.
This screenshot shows the carbon dioxide trajectory in which two out of three atoms have been selected in the molecules from the first half of the index range. This is not scientifically meaningful, but illustrates how the selection operations can be combined.

Fixes
The JSON representation is saved as str:


The indices are saved as a mask array:
To test
All tests must pass.