Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New atom selection mechanism #672

Merged
merged 46 commits into from
Mar 7, 2025
Merged

Conversation

MBartkowiakSTFC
Copy link
Collaborator

Description of work
The intention behind this PR is to implement a mechanism of selecting atoms which is:

  1. Not attached to a specific trajectory file,
  2. Transferable,
  3. Easy to extend.

We can still discuss if the implementation in this PR meets the requirements.

This screenshot shows the carbon dioxide trajectory in which two out of three atoms have been selected in the molecules from the first half of the index range. This is not scientifically meaningful, but illustrates how the selection operations can be combined.
selection_helper

Fixes

  1. New selection functions have been implemented to replace the classes we have been using so far.
  2. The output of each selection function is a set of indices.
  3. Selections are applied sequentially, and can be combined using set operations (union, intersection, difference).
  4. ReusableSelection class stores a sequence of selection functions and uses JSON for saving and loading a sequence.
  5. ReusableSelection.select_in_trajectory method takes a Trajectory instance as input, applies the selection and returns the resulting set of atom indices.
  6. Analysis jobs save the JSON selection sequence as one of the job inputs, and an array of 0/1 values to mark the indices which got selected.

The JSON representation is saved as str:
saved_input
The indices are saved as a mask array:
saved_selection_array

To test
All tests must pass.

@MBartkowiakSTFC MBartkowiakSTFC marked this pull request as draft February 14, 2025 17:12
@ChiCheng45
Copy link
Collaborator

ChiCheng45 commented Feb 18, 2025

Looks good. Just one thing about the SMARTS pattern matching. In the RDKit molecule object we use the Chem.rdchem.BondType.UNSPECIFIED for all of the bondtypes. This means that most SMARTS patterns you'd think should work won't and that is why the SMARTS patterns we use are a bit unusual.

The correct thing would be to set all the correct bond types and atom properties but I think this might get difficult to get working for a general molecule e.g. setting single/double/aromatic bonds for a benzene or cyclooctatetraene or some big protein.

Comment on lines 51 to 53
VALID_SELECTION = "Valid selection"
USELESS_SELECTION = "Selection did not change. This operation is not needed."
MALFORMED_SELECTION = "This is not a valid JSON string."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making these an Enum

Comment on lines 65 to 70
if self.mode_box.currentIndex() == 2:
return "difference"
elif self.mode_box.currentIndex() == 1:
return "intersection"
else:
return "union"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better as an Enum

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One complication here is that these values have to be JSON encoded and decoded (and remain human-readable when saved as a text string). I am sure that it can be done. Do you think it is worth the effort?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually referring to the 1, 2...
If you had:

class Modes(Enum):
    DIFFERENCE = auto()
    INTERSECTION = auto()
    UNION = auto()
...

        if self.mode_box.currentIndex() == Modes.DIFFERENCE:
            return "difference"
        elif self.mode_box.currentIndex() == Modes.INTERSECTION:
            return "intersection"
        elif self.mode_box.currentIndex() == Modes.UNION:
            return "union"

It removes magic numbers and ensures internal consistency a bit better if we ever add other operations.

As an aside, Enums can act as strings

self.selection_field.setPlaceholderText("0,1,2")
self.selection_keyword = "index_list"
self.selection_separator = ","
if new_mode == "range":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also probably Enum



def select_atoms(
trajectory: Trajectory, **function_parameters: Dict[str, Any]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently recommended that for **kwargs you document the types of any variables that they might contain.

https://peps.python.org/pep-0484/#arbitrary-argument-lists-and-default-argument-values

Set[int]
Set of all the atom indices
"""
return set(range(len(trajectory.chemical_system.atom_list)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we want a utility function for set(range(len(...))) or if there's a nicer way of writing it.



def select_all(
trajectory: Trajectory, **function_parameters: Dict[str, Any]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases where the arguments are not used internally, it's recommended that they are prefixed with an _

trajectory : Trajectory
A trajectory instance to which the selection is applied
function_parameters : Dict[str, Any]
should include a list of str molecule names under key "atom_labels"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are sensible options, they should be in the function signature, i.e.

def select_molecules(
    trajectory: Trajectory, *, 
    molecule_names: Sequence[str] | None = None, 
    **_function_parameters: typing.Never
) -> Set[int]:

(The lone * is to enforce molecule_names should only be passed via kwargs).

self.operations = {}

def set_selection(
self, number: Union[int, None] = None, function_parameters: Dict[str, Any] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should function parameters be ** here?

@@ -137,9 +135,8 @@ def update_transmutation_textbox(self) -> None:
map = self.transmuter.get_setting()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not you, but map here is shadowing the python function.

MBartkowiakSTFC and others added 6 commits February 28, 2025 18:08
Co-authored-by: Jacob Wilkins <46597752+oerc0122@users.noreply.github.com>
Signed-off-by: Maciej Bartkowiak <108934199+MBartkowiakSTFC@users.noreply.github.com>
Comment on lines 25 to 27
index_list: Sequence[int] = None,
index_range: Sequence[int] = None,
index_slice: Sequence[int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structly pep-484 doesn't allow implicit None.

C.f. the end of https://peps.python.org/pep-0484/#union-types

"""
selection = set()
system = trajectory.chemical_system
if rdkit_pattern:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have a default here?

set[int]
The atoms indices.
bool
True if the selection adds atoms, False otherwise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
True if the selection adds atoms, False otherwise
True if the operation changes selection, False otherwise

Comment on lines 124 to 129
if operation_type == "union":
selection = selection.union(temp_selection)
elif operation_type == "intersection":
selection = selection.intersection(temp_selection)
elif operation_type == "difference":
selection = selection.difference(temp_selection)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be either

Suggested change
if operation_type == "union":
selection = selection.union(temp_selection)
elif operation_type == "intersection":
selection = selection.intersection(temp_selection)
elif operation_type == "difference":
selection = selection.difference(temp_selection)
if operation_type == "union":
selection |= temp_selection
elif operation_type == "intersection":
selection &= temp_selection
elif operation_type == "difference":
selection -= temp_selection

Or the shorter, but slightly more esoteric

Suggested change
if operation_type == "union":
selection = selection.union(temp_selection)
elif operation_type == "intersection":
selection = selection.intersection(temp_selection)
elif operation_type == "difference":
selection = selection.difference(temp_selection)
if operation_type in ("union", "intersection", "difference"):
selection = getattr(selection, operation_type)(temp_selection)

Returns
-------
Set[int]
_description_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring

@@ -132,14 +134,11 @@ def get_information(self) -> str:

return "\n".join(info) + "\n"

def get_selector(self) -> Selector:
def get_selector(self) -> "ReusableSelection":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I wrong in thinking this is imported?

Suggested change
def get_selector(self) -> "ReusableSelection":
def get_selector(self) -> ReusableSelection:

class SelectionValidity(StrEnum):
VALID_SELECTION = "Valid selection"
USELESS_SELECTION = "Selection did not change. This operation is not needed."
MALFORMED_SELECTION = "This is not a valid JSON string."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MALFORMED_SELECTION = "This is not a valid JSON string."
MALFORMED_SELECTION = "This is not a valid selection string."

Copy link
Collaborator

@oerc0122 oerc0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me if it does what it's meant to, just a couple of possible stylistic suggestions. Feel free to ignore if you feel they remove clarity.

Comment on lines 93 to 96
if number is None:
number = len(self.operations)
else:
number = int(number)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not too long an option is:

Suggested change
if number is None:
number = len(self.operations)
else:
number = int(number)
number = int(number) if number is not None else len(self.operations)

Comment on lines 174 to 183
return bool(
(
len(selection.difference(current_selection)) > 0
and operation_type == "union"
)
or (
len(current_selection.difference(selection)) > 0
and operation_type != "union"
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give if len(Collection) > 0 is equivalent to if Collection we can simplify this to:

Suggested change
return bool(
(
len(selection.difference(current_selection)) > 0
and operation_type == "union"
)
or (
len(current_selection.difference(selection)) > 0
and operation_type != "union"
)
)
return ((selection - current_selection) and operation_type == "union") or
(current_selection - selection) and operation_type != "union"))

Comment on lines 243 to 246
if isinstance(v0, dict):
self.set_selection(number=k0, function_parameters=v0)
else:
raise TypeError(f"Selection {v0} is not a dictionary.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(v0, dict):
self.set_selection(number=k0, function_parameters=v0)
else:
raise TypeError(f"Selection {v0} is not a dictionary.")
if not isinstance(v0, dict):
raise TypeError(f"Selection {v0} is not a dictionary.")
self.set_selection(number=k0, function_parameters=v0)

Comment on lines 56 to 63
if position_minimum is None:
lower_limits = np.array(3 * [-np.inf])
else:
lower_limits = np.array(position_minimum)
if position_maximum is None:
upper_limits = np.array(3 * [np.inf])
else:
upper_limits = np.array(position_maximum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if position_minimum is None:
lower_limits = np.array(3 * [-np.inf])
else:
lower_limits = np.array(position_minimum)
if position_maximum is None:
upper_limits = np.array(3 * [np.inf])
else:
upper_limits = np.array(position_maximum)
lower_limits = (np.array(position_minimum)
if position_minimum is not None else
np.array([-np.inf] * 3))
upper_limits = (np.array(position_maximum)
if position_maximum is not None else
np.array([np.inf] * 3))

dict[str, list[int]]
For each atom type, a list of indices of selected atoms

"""
indicesPerElement = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simplified to:

indicesPerElement = defaultdict(list)
for i, v in enumerate(self["names"]):
    indicesPerElement[v].extend(self["indices"][i])

@ChiCheng45
Copy link
Collaborator

ChiCheng45 commented Mar 6, 2025

Some issues I've found and comments.

  1. The default selection string in the gui {"all": true}, this causes the job to crash.
  2. Can the invert button be on its own line? It's slightly confusing as it's in the ALL atoms section.
  3. Selecting or deselecting in the 3D view doesn't update the selection setting.
  4. Odd behaviours with the atom transmutation and the reset.

Animation64

  1. Invalid SMARTS causes the job to fail.
  2. The setting dragging causes the setting to disappear.

Animation65

  1. The setting {"0": {"function_name": "invert_selection"}} is ambiguous; at the moment, it's equivalent to selecting all atoms. I guess this is fine but maybe it should be an invalid setting similarly to something like {"0": {"function_name": "select_atoms", "atom_types": ["H"], "operation_type": "difference"}}.

@MBartkowiakSTFC
Copy link
Collaborator Author

Some issues I've found and comments.

  1. The default selection string in the gui {"all": true}, this causes the job to crash.

Fixed.

  1. Can the invert button be on its own line? It's slightly confusing as it's in the ALL atoms section.

I could argue that it is an operation on all atoms, since inverted selection = ALL atom indices - current selection
But this is mainly because I think it would take a lot of space in the layout, will be different to all the other widgets (since it is just one button) and it may still be hard to find after all this.

  1. Selecting or deselecting in the 3D view doesn't update the selection setting.

Now that the selection is a sequence of operations, it is less clear how the clicks should be handled. Two options I can think of:

  • each click creates its own selection operation on a single atom
  • while you are clicking, indices are added to a single operation. If you do something else in the meantime, a new selection operation will be added next time you click.
    Suggestions welcome.
  1. Odd behaviours with the atom transmutation and the reset.

This is weird and I have not found the reason for it.

  1. Invalid SMARTS causes the job to fail.

Fixed.

  1. The setting dragging causes the setting to disappear.

Most likely I will need to subclass QListView to fix this. For now I will disable dragging.

  1. The setting {"0": {"function_name": "invert_selection"}} is ambiguous; at the moment, it's equivalent to selecting all atoms. I guess this is fine but maybe it should be an invalid setting similarly to something like {"0": {"function_name": "select_atoms", "atom_types": ["H"], "operation_type": "difference"}}.

I could make sure that the selection widget always starts with an explicit "select all" instead of being empty. Because at the moment empty selection is assumed to mean "select all", but the first selection replaces it.

@ChiCheng45
Copy link
Collaborator

ChiCheng45 commented Mar 6, 2025

Since we are letting the user enter SMARTS patterns for matching, I think we need to try to tackle #286 at some point. Currently, we don't have bonding information, so users can only use "C~C" for two carbon atoms bonded using any bond ("~") and things like "c:c" (aromatic carbons joined by an aromatic bond) and "c-c" (aromatic carbons joined by a single bond (e.g. biphenyl)) currently do not work.

@MBartkowiakSTFC
Copy link
Collaborator Author

Some issues I've found and comments.

  1. The default selection string in the gui {"all": true}, this causes the job to crash.
  2. Can the invert button be on its own line? It's slightly confusing as it's in the ALL atoms section.
  3. Selecting or deselecting in the 3D view doesn't update the selection setting.
  4. Odd behaviours with the atom transmutation and the reset.
  5. Invalid SMARTS causes the job to fail.
  6. The setting dragging causes the setting to disappear.
  7. The setting {"0": {"function_name": "invert_selection"}} is ambiguous; at the moment, it's equivalent to selecting all atoms. I guess this is fine but maybe it should be an invalid setting similarly to something like {"0": {"function_name": "select_atoms", "atom_types": ["H"], "operation_type": "difference"}}.

OK, I think that I managed to improve matters regarding points 1, 4, 5, 6 and 7.
I am happy to ignore point 2 for now, just because it is not urgent. We can agree on a solution and fix it.
Point 3 will require more work. I think I will open a new issue.

@MBartkowiakSTFC MBartkowiakSTFC marked this pull request as ready for review March 6, 2025 17:25
@MBartkowiakSTFC MBartkowiakSTFC merged commit 1dda714 into protos Mar 7, 2025
38 checks passed
@MBartkowiakSTFC MBartkowiakSTFC deleted the maciej/simplify-atom-selection branch March 7, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants