Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressions with aliased tables on right side of "is distinct from" operator in snowflake are incorrectly flagged as hard coded references. #338

Closed
1 of 5 tasks
Qwertronix opened this issue May 3, 2023 · 10 comments · Fixed by #411
Labels
bug Something isn't working hard-coded refs

Comments

@Qwertronix
Copy link

Qwertronix commented May 3, 2023

Describe the bug

Expressions with aliased tables on right side of "is distinct from" operator (snowflake SQL) are incorrectly flagged as hard coded references. This only seems to occur when the right-side expression to the operator matches the pattern table_alias.column_name -- more complex expressions or column references without a table alias don't seem to trigger this eval.

Steps to reproduce

Given this SQL in a model (using snowflake SQL for a project that only uses the snowflake adapter):

    select
        old.* exclude (fact_effective_thru, fact_history_is_current) 
        ,current_timestamp() as fact_effective_thru
        ,false as fact_history_is_current
    from
        {{ref(old_state)}} as old
    inner join {{ref(new_state)}} as new
        using(fact_durable_key)
    where
        old.ALL_SOURCE_ATTRIBUTES is distinct from new.ALL_SOURCE_ATTRIBUTES

the last line flags this model under fct_hard_coded_references.

Expected results

The evaluation should not flag new.ALL_SOURCE_ATTRIBUTES as a hard coded reference. The model should not get flagged in the case demonstrated above since there are no hard coded references in the snippet.

Actual results

The model gets flagged, with new.ALL_SOURCE_ATTRIBUTES indicated as the hard_coded_reference that flagged this model.

Screenshots and log output

20:40:21  Warning in test is_empty_fct_hard_coded_references_ (models\marts\dag\dag.yml)
20:40:21  Got 1 result, configured to warn if != 0
20:40:21
20:40:21    compiled Code at target\compiled\dbt_project_evaluator\models\marts\dag\dag.yml\is_empty_fct_hard_coded_references_.sql

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.0.0
  - package: dbt-labs/metrics
    version: [">=1.4.0", "<1.5.0"]
  - package: dbt-labs/dbt_project_evaluator
    version: 0.5.0

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other

The output of dbt --version:

PS C:\xxxxxxxxxxxx> dbt --version
Core:
  - installed: 1.4.5
  - latest:    1.5.0 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 1.4.2 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Additional context

The code given in the example is not best practice, and the error can be circumvented with either a cross-database macro that implements similar functionality or with an exception, but even so it should not be flagged by this eval (maybe another eval should exist for similar scenarios).

Are you interested in contributing the fix?

Unsure where to start. Maybe, depending on current methodology used for finding hard coded references.

@Qwertronix Qwertronix added the bug Something isn't working label May 3, 2023
@dave-connors-3
Copy link
Collaborator

hey @Qwertronix! great find! the logic for finding hard coded references live in this file -- I think we'd just need to adjust the regex to allow for this exception. Would you have interest in trying your hand at this change?

@Qwertronix
Copy link
Author

Hmm, I could easily put together a PR with a "dirty" fix using a fixed-length negative lookbehind ((?<!is\sdistinct\s)).

Capturing the general case with arbitrary whitespace seems orders of magnitude more complex (unless I'm missing some other way out) and might be easier to achieve with changes to the Jinja macro or perhaps as part of an implementation of #335.

In terms of jinja macro options, that might mean adding a group of extra patterns to exclude, and running those patterns against each match to find matches output from the first list of patterns that match the "bad matches" filters from the second list of patterns, and filtering those out before passing the remaining matches to all_hard_coded_references_list. That might be a lot of extra moving parts for solving this though.

@b-per
Copy link
Collaborator

b-per commented May 4, 2023

TIL that is distinct from is a thing! No idea why they picked this syntax in the standard where distinct and from already meant different things...

My 2 cents would be that a simple fixed-length negative lookbehind would be enough. I don't think that too many people would be using this syntax.

Qwertronix added a commit to Qwertronix/dbt-project-evaluator that referenced this issue May 4, 2023
previously gets flagged as hard-coded references.

fixes dbt-labs#338
@b-per
Copy link
Collaborator

b-per commented May 11, 2023

Hi @Qwertronix , I can see that you added changes in your fork. Would you want to raise a PR for that? In that case it would be great to add an integration tests that covers the issue that your changes are fixing.

@Qwertronix
Copy link
Author

Hi @b-per! For sure, that was the reason for the hold-up -- I got pulled away by other tasks before I could finish writing integration tests and testing the fix itself. Can got kicked down the road for a few days, but I'm planning to get that finished up and submitted.

@graciegoheen
Copy link
Collaborator

Hi @Qwertronix - any update on this work? :)

@Qwertronix
Copy link
Author

Hi @Qwertronix - any update on this work? :)

Yeah, ran into a bit of a blocker that made the simple naive approach I was going to take a bit of a non-starter. Obvious in hindsight. Can't negative-lookbehind both the "is distinct from" and "is not distinct from" at the same time in the same regex, so this needs a different solution than the one I was envisioning. Other business duties have taken priority in the meantime as usual :\

Copy link
Contributor

github-actions bot commented Jan 3, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@graciegoheen
Copy link
Collaborator

graciegoheen commented Jan 4, 2024

I was able to reproduce by updating the SQL in stg_model_5 to:

select 
    * 
from {{ ref('fct_model_9') }}
-- inner join {{ ref('fct_model_6') }}
-- on fct_model_9.join_field = fct_model_6.join_field
-- where fct_model_9.id is distinct from fct_model_6.id

I attempted adding a negative look back for distinct to each of the regex checks, but for some reason it's still flagging it...

            'from_table_1':
                '(?ix)

                # NOT following "distinct "
                (?<!distinct )

                # first matching group
                # from or join followed by at least 1 whitespace character            
                (from|join)\s+

                # second matching group
                # 1 or 0 of (opening bracket, backtick, or quotation mark)
                ([\[`\"\']?)

                # third matching group
                # at least 1 word character
                (\w+)

                # fouth matching group
                # 1 or 0 of (closing bracket, backtick, or quotation mark)
                ([\]`\"\']?)

                # fifth matching group
                # a period
                (\.)

                # sixth matching group
                # 1 or 0 of (opening bracket, backtick, or quotation mark)
                ([\[`\"\']?)

                # seventh matching group
                # at least 1 word character
                (\w+)

                # eighth matching group
                # 1 or 0 of (closing bracket, backtick, or quotation mark) folowed by a whitespace character or end of string
                ([\]`\"\']?)(?=\s|$)

                ',

Any ideas why this wouldn't work?

@graciegoheen
Copy link
Collaborator

I figured it out! I need to use the space \s instead of an actual space. ¯_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hard-coded refs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants