Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestConnectionOperator #47082

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fritz-astronomer
Copy link
Contributor

Given the default Test Connection method in the Airflow UI being disabled, I find myself frequently writing similar code to this, while implementing a "Canary DAG" patterns

Additionally - as an operator, there is the added benefit of a standard method of testing connections that may be defined via other mechanisms, and having a clear procedure that can be easily recommended to new users to help debug connection woes and test things like networking configurations.


Remaining todo:

  • docs
  • tests

@fritz-astronomer fritz-astronomer marked this pull request as draft February 25, 2025 21:22
@eladkal
Copy link
Contributor

eladkal commented Feb 25, 2025

having a clear procedure that can be easily recommended to new users to help debug connection woes and test things like networking configurations.

Why not just use the operator they actually want to use?

I am not convinced about this feature

@fritz-astronomer
Copy link
Contributor Author

fritz-astronomer commented Feb 25, 2025

@eladkal my thoughts:

  • establishing/testing connections is something I see Airflow users struggle with constantly, and it can often be pretty frustrating/obscure what is or isn't working, especially for someone new to Airflow. Isolating just the connection can simplify the process and should be encouraged and easy
  • giving users the tools to spend less time debugging and make a difficult process easy is useful
  • the canary pattern wants to exercise the connection in a meaningful but non-impactful way (e.g. select 1; not select * from my_table limit ...). It isn't always immediately obvious how to do that, when establishing a connection to a system that a user may only have partial familiarity with, or users may make a disruptive test by accident
    • this could also be a great @setup task, or something to encourage at the start of a DAG, e.g. to see if the database actually is up and exists
  • The functionality for test_connection is already there for many hooks/connection types, and includes the correct tests, so I'd say "D.R.Y.". Why make users reimplement a user-friendly feature?
  • not all connections are in the Airflow UI (e.g. a secrets backend), so this has additional benefit for testing those connections, and the ability to retrieve them. The pattern also functions as a way to document what connections exist and are functional for the Airflow instance
  • that codepath is often disabled / unusable via the AF Connections UI. Often users don't even know that exists, nor why it's disabled, nor how to enable it or the implications of enabling it instance-wide. I'd say this is good for security while preserving a friendly UX that had been previously exposed

One change in this from the test_connection function is that the original doesn't actually fail, when invoked via a task, which is why this is modified to raise.

If .test_connection had the option to raise, an alternative could be publishing/documenting:

@task
def canary(conn_id):
  return BaseHook.get_hook(conn_id=self.conn_id).test_connection(raise=True)

(though my vote would still be "why expect new users to know this random snippet, or find it in a doc or blog post?" Something off the shelf feels more friendly and useful, especially given how connections are one of the most core pieces of Airflow and are a pretty rough-edged UX at the moment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants