-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Update is_(not)_in_range (#87) to support max/min limits from col #153
Conversation
✅ 126/126 passed, 1 skipped, 17m53s total Running from acceptance #498 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
min_limit: int | datetime.date | datetime.datetime | str, | ||
max_limit: int | datetime.date | datetime.datetime | str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it a bit different:
- add support for a string as a valid range
- add support for a column object as min/max arguments, because people may use not simple column names but more complex expressions (via
F.expr
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, updated code to support string ranges and column expression
5734785
to
1bd0402
Compare
min_limit: int | datetime.date | datetime.datetime | str = '', | ||
max_limit: int | datetime.date | datetime.datetime | str = '', | ||
min_limit_col_expr: str | Column = '', | ||
max_limit_col_expr: str | Column = '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use Optional type or None
instead of default of empty string
min_limit: int | datetime.date | datetime.datetime | str = '', | ||
max_limit: int | datetime.date | datetime.datetime | str = '', | ||
min_limit_col_expr: str | Column = '', | ||
max_limit_col_expr: str | Column = '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well
min_limit: int | datetime.date | datetime.datetime | str = '', | ||
max_limit: int | datetime.date | datetime.datetime | str = '', | ||
min_limit_col_expr: str | Column = '', | ||
max_limit_col_expr: str | Column = '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
if min_limit_col_expr is None: | ||
min_limit_expr = F.lit(min_limit) | ||
else: | ||
min_limit_expr = F.col(min_limit_col_expr) if isinstance(min_limit_col_expr, str) else min_limit_col_expr | ||
if max_limit_col_expr is None: | ||
max_limit_expr = F.lit(max_limit) | ||
else: | ||
max_limit_expr = F.col(max_limit_col_expr) if isinstance(max_limit_col_expr, str) else max_limit_col_expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to throw an error if both min_limit
and min_limit_col_expr
are None
(same for max_limit
and max_limit_col_expr
).
Otherwise we'll get conditions like (F.col(col_name) < F.lit(None)) | (F.col(col_name) > F.lit(None))
* Fixed cli installation and demo ([#177](#177)). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has been refactored and udpated to improve clarity. The following issue have been closed: [#171](#171) and [#198](#198). * [Feature] Update is_(not)_in_range ([#87](#87)) to support max/min limits from col ([#153](#153)). In this release, the `is_in_range` and `is_not_in_range` quality rule functions have been updated to support a column as the minimum or maximum limit, in addition to a literal value. This change is accomplished through the introduction of optional `min_limit_col_expr` and `max_limit_col_expr` arguments, allowing users to specify a column expression as the minimum or maximum limit. Extensive testing, including unit tests and integration tests, has been conducted to ensure the correct behavior of the new functionality. These enhancements offer increased flexibility when defining quality rules, catering to a broader range of use cases and scenarios.
* Fixed cli installation and demo ([#177](#177)). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has been refactored and udpated to improve clarity. The following issue have been closed: [#171](#171) and [#198](#198). * [Feature] Update is_(not)_in_range ([#87](#87)) to support max/min limits from col ([#153](#153)). In this release, the `is_in_range` and `is_not_in_range` quality rule functions have been updated to support a column as the minimum or maximum limit, in addition to a literal value. This change is accomplished through the introduction of optional `min_limit_col_expr` and `max_limit_col_expr` arguments, allowing users to specify a column expression as the minimum or maximum limit. Extensive testing, including unit tests and integration tests, has been conducted to ensure the correct behavior of the new functionality. These enhancements offer increased flexibility when defining quality rules, catering to a broader range of use cases and scenarios.
Changes
Changes in the is_in_range and is_not_in_range column function to handle a column as the min/max limit along with the literal value.
Linked issues
Resolves #87
Tests