Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DWX-19926: Add Impala properties #216

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

DWX-19926: Add Impala properties #216

wants to merge 13 commits into from

Conversation

prateek1192
Copy link
Contributor

This commit adds the Impala resource and adds all the remaining properties except for a config and a subset of Autoscaling group. Those two need changes in Impala APIs as such would make those changes and would come back and try to enable the code here.

@prateek1192 prateek1192 requested a review from tevesz February 24, 2025 09:27
@prateek1192 prateek1192 requested a review from a team as a code owner February 24, 2025 09:27
@gregito
Copy link
Contributor

gregito commented Feb 24, 2025

@prateek1192 this PR looks like something really in progress, could you please change it to draft until it will be ready for review?

@prateek1192
Copy link
Contributor Author

@prateek1192 this PR looks like something really in progress, could you please change it to draft until it will be ready for review?

I let some TODOs remain so that I can fix in Impala APIs and come back here but lemme remove those as well .

@@ -27,7 +27,48 @@ resource "cdp_dw_vw_impala" "impala-terraform" {
database_catalog_id = var.database_catalog_id
name = var.name
image_version = var.image_version
tshirt_size = var.tshirt_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to add the example file to the change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out, I added it

scratch_space_limit = var.scratch_space_limit
}

impala_ha_settings = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an impala resource, simply ha_settings should be clear, no need to duplicate the type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is more consistent with swagger and also since this is a impala specific options and if at some point Hive adds it no matter how unlikely it feels right now, we would have a conflict. Lets just keep it.


enable_unified_analytics = var.enable_unified_analytics

query_isolation_options = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure impala supports this? query isolation is a feature in hive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, this exists! can you implement in a way that it can be set only if the enable_unified_analytics is set? one other way to do it to add these into a "unified_analytics" group and set the query isolation from there, like

unified_analytics = {
   enabled = bool
   query_isolation_max_queries = var.max_queries
   query_isolation_max_nodes_per_query = var.max_nodes_per_query
}

all options are optional I think

}

impala_options = {
scratch_space_limit = var.scratch_space_limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the scratch space limit be set in azure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with Abhishek Rawat and he said that Azure does have it

hive_authentication_mode = var.hive_authentication_mode
platform_jwt_auth = var.platform_jwt_auth
impala_query_log = var.impala_query_log
ebs_llap_spill_gb = var.ebs_llap_spill_gb
Copy link
Contributor

@tevesz tevesz Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llap is in hive only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

}

resource_pool = var.resource_pool
hive_authentication_mode = var.hive_authentication_mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this property represent? what auth mode can be set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks deprecated, because both kerberos and ldap options are enabled by default. However there should an an enable_sso option for impala and hive too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, removed

max_nodes_per_query = var.max_nodes_per_query
}

resource_pool = var.resource_pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the quota management feature, only private cloud supports this, it can be dropped for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

min_clusters = var.min_clusters
}

impala_options = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these are impala options :) if the scratch space can only be set in aws can you rename this to aws_options? I think the availability_zone can only be set in case of AWS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you are right :)
But I am trying to stay close to the swagger docs as much as possible, later on when someone looks at this and tries to figure out and sees that we have opiniated abstractions on top of swagger naming, that would be a real mess.

"impala_scale_up_delay_seconds": types.Int32Type,
"max_clusters": types.Int32Type,
"min_clusters": types.Int32Type,
/*"impala_executor_group_sets": types.ListType{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you should remove these from here. But keep them on a local branch, I mean if you will add them later it is good to keep it around somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a backup branch with the name of this branch but backup appended in the end so that I can come back and fix :)
Thanks for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants