Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use query parameters wherever possible in Neo4jStore #330

Merged
merged 11 commits into from
Nov 19, 2024

Conversation

LilDojd
Copy link
Contributor

@LilDojd LilDojd commented Nov 16, 2024

This PR closes #259 by replacing f-string substitutions with parameterized queries in the Neo4jStore class. The changes should slightly improve query performance but the main goal of the PR is to make Cypher injections impossible

Changes Made

  • Added parametrized queries where they were missing
  • Added a test to showcase how previous queries could be abused for unintended behavior 10456b9
  • When building a ScopedKey, GufeKey is validated for common exploits, including whitespace checks, unicode encoding stuff and general adherence to GufeKey format 77d3cde dd65007 99034dc
  • _generate_claim_query is replaced with a const str CLAIM_QUERY. Parameters are now supplied on-site when performing a transaction
  • In cases where queries contained statuses from the TaskStatusEnum, f-strings are now consistently used for ease of work with LSP

Please let me know if I have missed anything and thanks!

Copy link
Member

@ianmkenney ianmkenney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LilDojd Thanks so much for this PR. Please see my review comments---I have a couple of observations/suggestions.

alchemiscale/models.py Outdated Show resolved Hide resolved
alchemiscale/models.py Outdated Show resolved Hide resolved
alchemiscale/models.py Show resolved Hide resolved
alchemiscale/storage/statestore.py Show resolved Hide resolved
alchemiscale/storage/statestore.py Outdated Show resolved Hide resolved
@@ -2229,7 +2238,9 @@ def set_task_priority(
RETURN scoped_key, t
"""
res = tx.run(
q, scoped_keys=[str(t) for t in tasks], priority=priority
q,
scoped_keys=[str(t) for t in tasks if t is not None],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tasks will not have (🤞) None elements based on the type annotation. I'd say it's safe to keep this as is, but I also don't think it will hurt. Maybe @dotsdl has a differing opinion here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LilDojd did you notice a case where inputs to this method may include None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really.

But since you have used cypher_list_from_scoped_keys(tasks), which filters out Nones internally, in action_tasks, and it, too, has List[ScopedKey] in the signature for tasks, I thought it was better to take a precaution

def action_tasks(
self,
tasks: List[ScopedKey],
taskhub: ScopedKey,
) -> List[Union[ScopedKey, None]]:
"""Add Tasks to the TaskHub for a given AlchemicalNetwork.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, there would be no Nones on the server, since

asc.set_tasks_priority(tasks + [None], 1)

Would just throw
AttributeError: 'NoneType' object has no attribute 'dict' client-side

async def _set_task_priority(
self, tasks: List[ScopedKey], priority: int
) -> List[Optional[ScopedKey]]:
data = dict(tasks=[t.dict() for t in tasks], priority=priority)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, thank you for giving your rationale @LilDojd! I see no harm in leaving this as you have done it here, as we have in action_tasks. Up to you if you'd like to leave it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have decided to remove None checks for cleanliness. Will save a few clock cycles :)

@@ -2266,7 +2277,7 @@ def get_task_priority(self, tasks: List[ScopedKey]) -> List[Optional[int]]:
WHERE t._scoped_key = scoped_key
RETURN t.priority as priority
"""
res = tx.run(q, scoped_keys=[str(t) for t in tasks])
res = tx.run(q, scoped_keys=[str(t) for t in tasks if t is not None])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: annotation + potential @dotsdl opinion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I don't mind keeping the if t is not None, but agree it isn't necessary here. Up to @LilDojd what he would like to do.

assert len(n4js.query_transformations(scope=multiple_scopes[0])) == 0

mark = n4js._query(qualname="InjectionMark")
assert len(mark) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the *query* methods from here down, we should also check directly with cypher via the n4js.execute_query method, even though it appears to be duplicated work. I think this issue is important enough to justify explicitly checking that the convenience query methods are consistent.

MATCH (n:InjectionMark)
RETURN n

@ianmkenney ianmkenney requested a review from dotsdl November 18, 2024 18:53
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 95.69892% with 4 lines in your changes missing coverage. Please review.

Project coverage is 80.16%. Comparing base (4926573) to head (4848236).
Report is 176 commits behind head on main.

Files with missing lines Patch % Lines
alchemiscale/storage/statestore.py 95.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
+ Coverage   80.10%   80.16%   +0.06%     
==========================================
  Files          27       27              
  Lines        3478     3484       +6     
==========================================
+ Hits         2786     2793       +7     
+ Misses        692      691       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@LilDojd
Copy link
Contributor Author

LilDojd commented Nov 18, 2024

Thank you for your comments, @ianmkenney
It's a bit late, so I will address them tomorrow when I get the time

Co-authored-by: Ian Kenney <ianmichaelkenney@gmail.com>
Copy link
Member

@dotsdl dotsdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @LilDojd! Seriously thank you so much for this!

Just a few comments from me, but nothing blocking in my view once @ianmkenney's comments are addressed!

alchemiscale/storage/statestore.py Show resolved Hide resolved
alchemiscale/storage/statestore.py Show resolved Hide resolved
RETURN task, ar
"""
results.append(tx.run(q).to_eager_result())
tasks_list = [{"task": str(t), "weight": w} for t, w in tasks.items()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant!

Comment on lines +1558 to +1559
OPTIONAL MATCH (th:TaskHub {_scoped_key: $taskhub})-[ar:ACTIONS]->(task:Task {_scoped_key: task_scoped_key})
RETURN task_scoped_key, ar.weight AS weight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, this should preserve the behavior that if there is no matching Task actioned on the TaskHub then weight should come back as None.

@@ -2229,7 +2238,9 @@ def set_task_priority(
RETURN scoped_key, t
"""
res = tx.run(
q, scoped_keys=[str(t) for t in tasks], priority=priority
q,
scoped_keys=[str(t) for t in tasks if t is not None],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, thank you for giving your rationale @LilDojd! I see no harm in leaving this as you have done it here, as we have in action_tasks. Up to you if you'd like to leave it in.

@@ -2266,7 +2277,7 @@ def get_task_priority(self, tasks: List[ScopedKey]) -> List[Optional[int]]:
WHERE t._scoped_key = scoped_key
RETURN t.priority as priority
"""
res = tx.run(q, scoped_keys=[str(t) for t in tasks])
res = tx.run(q, scoped_keys=[str(t) for t in tasks if t is not None])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I don't mind keeping the if t is not None, but agree it isn't necessary here. Up to @LilDojd what he would like to do.

alchemiscale/storage/statestore.py Show resolved Hide resolved
@LilDojd
Copy link
Contributor Author

LilDojd commented Nov 19, 2024

Should be taken care of now, please do check. Thanks!

@ianmkenney ianmkenney merged commit 49182dc into OpenFreeEnergy:main Nov 19, 2024
7 checks passed
@ianmkenney
Copy link
Member

@LilDojd Thanks for bringing this to our attention and providing this great solution to the issue!

@LilDojd LilDojd deleted the feat/query-params branch November 20, 2024 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use query parameters wherever possible in Neo4jStore instead of f-string substitutions
3 participants