-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pytx] Add Tech Against Terrorism Hash API #1617
Conversation
Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Hey Bruce! Great to be working with you! I will help you get this landed and review it. As for the details of the API, I did read your dos and though you picked a bit of a challenging implementation, but I think we can make it work! I'll give you some tips on the PR itself. |
Hey David, likewise! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good!
Thank you for including your manual test steps in the PR summary!
The one blocking thing I have is that I think the interface will become easier if you standardize on throwing exceptions instead of catching them and return a new object.
Let me know if you are short on time or energy, I'm willing to do the labor to bring this in the rest of the way if you are short on either of those.
# Maybe move to a common library someday | ||
from threatexchange.exchanges.clients.fb_threatexchange.api import TimeoutHTTPAdapter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignorable: If you want to move this to a common helper library under clients, now seems like a reasonable time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved 👍
|
||
@dataclass | ||
class TATUser: | ||
user: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to narrow this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our Actual User object can be quite large (possibly hundreds of permissions) and will vary in size based on who is using the interface. I can narrow this down to Dict[str, Any]
|
||
This API delivers a JSON file containing a comprehensive list of all hashed terrorist content within the TAT system. | ||
|
||
The list is refreshed daily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: If you have public documentation, can you also add a link to it here?
python-threatexchange/threatexchange/exchanges/clients/techagainstterrorism/api.py
Show resolved
Hide resolved
|
||
except Exception as e: | ||
logging.error("Error authenticating with TCAP: %s", e) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking question: There's no recovery from this, right? Why not let the exception leak out?
response.raise_for_status() | ||
return response.json() | ||
|
||
def authenticate(self, username: str, password: str) -> t.Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would personally rename this get_auth_token
|
||
endpoint = f"{TATEndpoint.hash_list.value}/{ideology}" | ||
|
||
print("ENDPOINT", endpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: No stdout in libraries please
except requests.exceptions.HTTPError as http_err: | ||
return TATAPIErrorResponse(error=str(http_err)) | ||
|
||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking q: Why not let exceptions leak out? This is exceptional circumstances, and I think the other clients in this library except on error. Some libraries are very anti-exception, but not this one :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
Forgive me it's my first time creating/contributing to a package I have mostly worked on web apps where we catch and handle everything!
Today I have learned it's better practice in libraries to let the exception leak up the stack 🙇 awesome!
|
||
def get_hash_list( | ||
self, ideology: str = TATIdeology._all.value | ||
) -> t.Union[TATHashListResponse, TATAPIErrorResponse]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: This is a challenging type to deal with on the caller side! The simplest solution is to just throw exceptions. Exceptions for for exceptional circumstances, and failing to fetch seems to match that.
python-threatexchange/threatexchange/exchanges/clients/techagainstterrorism/tests/test_api.py
Show resolved
Hide resolved
Thanks! I have moved the Idea 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea 🤔
I have noticed across all clients we're using _get_session within a content manager
Maybe we could lift this up into a parent class ? PyTxClientAPIBase for the lack of a better definition or maybe introduce an Abstract Base Class for the places where client are using _get _post. Just a thought 😄
It's an interesting idea! Let's first get your SignalExchangeAPI implementation working, and we can come back to it.
You only have ONE thing left to fix - the lint error on running black!
@@ -0,0 +1,17 @@ | |||
from requests.adapters import HTTPAdapter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I git an internal nagger that forces me to add
# Copyright (c) Meta Platforms, Inc. and affiliates.
to the tops of all files, though I can do it as a followup
@@ -0,0 +1,94 @@ | |||
import typing as t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I git an internal nagger that forces me to add
# Copyright (c) Meta Platforms, Inc. and affiliates.
to the tops of all files, though I can do it as a followup
python-threatexchange/threatexchange/exchanges/clients/techagainstterrorism/api.py
Show resolved
Hide resolved
|
||
except Exception as exception: | ||
logging.error("Failed to get hash list: %s", exception) | ||
raise Exception("Failed to fetch Hash List: %s", exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking (though I'm fine fixing it on a followup): Instead of wrapping this exception, rethrow the one on stack by just doing
raise
Ref: https://docs.python.org/3/tutorial/errors.html#enriching-exceptions-with-notes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in the code so I made the change. Thanks for the reference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Summary
Hi
Bruce here from Tech Against Terrorism 👋
This PR contains Step 1 from this issue integrating the Tech Against Terrorism Hash List API into
python-threatexchange
I used StopNSCII & NCMEC inside
/client/exchanges/
as initial guidance.Caveat
Our API returns a pre-signed URL to a JSON file for our hash list, instead of paginated results. I'm wondering how we want to handle this moving forward into part 2 (Once we're happy with part 1) ?
Test Plan
Manual
For manual testing you can use python shell to simply instantiate the
TATHashListAPI
with your TCAP username and password.Automated
I have written unit tests
techagainstterrorism/tests
sopytest
will do it.