Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Update is_(not)_in_range (#87) to support max/min limits from col #153

Merged
merged 10 commits into from
Feb 26, 2025

Conversation

karthik-ballullaya-db
Copy link
Contributor

@karthik-ballullaya-db karthik-ballullaya-db commented Feb 5, 2025

Changes

Changes in the is_in_range and is_not_in_range column function to handle a column as the min/max limit along with the literal value.

Linked issues

Resolves #87

Tests

  • manually tested
  • added unit tests
  • added integration tests

Copy link

github-actions bot commented Feb 5, 2025

✅ 126/126 passed, 1 skipped, 17m53s total

Running from acceptance #498

Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 318 to 319
min_limit: int | datetime.date | datetime.datetime | str,
max_limit: int | datetime.date | datetime.datetime | str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it a bit different:

  1. add support for a string as a valid range
  2. add support for a column object as min/max arguments, because people may use not simple column names but more complex expressions (via F.expr).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, updated code to support string ranges and column expression

Comment on lines 285 to 288
min_limit: int | datetime.date | datetime.datetime | str = '',
max_limit: int | datetime.date | datetime.datetime | str = '',
min_limit_col_expr: str | Column = '',
max_limit_col_expr: str | Column = '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use Optional type or None instead of default of empty string

Comment on lines 311 to 314
min_limit: int | datetime.date | datetime.datetime | str = '',
max_limit: int | datetime.date | datetime.datetime | str = '',
min_limit_col_expr: str | Column = '',
max_limit_col_expr: str | Column = '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well

Comment on lines 348 to 351
min_limit: int | datetime.date | datetime.datetime | str = '',
max_limit: int | datetime.date | datetime.datetime | str = '',
min_limit_col_expr: str | Column = '',
max_limit_col_expr: str | Column = '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Comment on lines +298 to +305
if min_limit_col_expr is None:
min_limit_expr = F.lit(min_limit)
else:
min_limit_expr = F.col(min_limit_col_expr) if isinstance(min_limit_col_expr, str) else min_limit_col_expr
if max_limit_col_expr is None:
max_limit_expr = F.lit(max_limit)
else:
max_limit_expr = F.col(max_limit_col_expr) if isinstance(max_limit_col_expr, str) else max_limit_col_expr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to throw an error if both min_limit and min_limit_col_expr are None (same for max_limit and max_limit_col_expr).

Otherwise we'll get conditions like (F.col(col_name) < F.lit(None)) | (F.col(col_name) > F.lit(None))

@alexott alexott merged commit 7a4fa87 into main Feb 26, 2025
9 checks passed
@alexott alexott deleted the feat/(not)_in_range_col_func branch February 26, 2025 12:31
mwojtyczka added a commit that referenced this pull request Feb 27, 2025
* Fixed cli installation and demo ([#177](#177)). In this release, changes have been made to adjust the dashboard name, ensuring compliance with new API naming rules. The dashboard name now only contains alphanumeric characters, hyphens, or underscores, and the reference section has been split for clarity. In addition, demo for the tool has been updated to work regardless if a path or UC table is provided in the config. Furthermore, documentation has been refactored and udpated to improve clarity. The following issue have been closed: [#171](#171) and [#198](#198).
* [Feature] Update is_(not)_in_range ([#87](#87)) to support max/min limits from col ([#153](#153)). In this release, the `is_in_range` and `is_not_in_range` quality rule functions have been updated to support a column as the minimum or maximum limit, in addition to a literal value. This change is accomplished through the introduction of optional `min_limit_col_expr` and `max_limit_col_expr` arguments, allowing users to specify a column expression as the minimum or maximum limit. Extensive testing, including unit tests and integration tests, has been conducted to ensure the correct behavior of the new functionality. These enhancements offer increased flexibility when defining quality rules, catering to a broader range of use cases and scenarios.
@mwojtyczka mwojtyczka mentioned this pull request Feb 27, 2025
mwojtyczka added a commit that referenced this pull request Feb 27, 2025
* Fixed cli installation and demo
([#177](#177)). In this
release, changes have been made to adjust the dashboard name, ensuring
compliance with new API naming rules. The dashboard name now only
contains alphanumeric characters, hyphens, or underscores, and the
reference section has been split for clarity. In addition, demo for the
tool has been updated to work regardless if a path or UC table is
provided in the config. Furthermore, documentation has been refactored
and udpated to improve clarity. The following issue have been closed:
[#171](#171) and
[#198](#198).
* [Feature] Update is_(not)_in_range
([#87](#87)) to support
max/min limits from col
([#153](#153)). In this
release, the `is_in_range` and `is_not_in_range` quality rule functions
have been updated to support a column as the minimum or maximum limit,
in addition to a literal value. This change is accomplished through the
introduction of optional `min_limit_col_expr` and `max_limit_col_expr`
arguments, allowing users to specify a column expression as the minimum
or maximum limit. Extensive testing, including unit tests and
integration tests, has been conducted to ensure the correct behavior of
the new functionality. These enhancements offer increased flexibility
when defining quality rules, catering to a broader range of use cases
and scenarios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]: is_(not)_in_range supports min/max_limit from the dataframe
3 participants