-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update fix/analytics data export cleanup #4300
Update fix/analytics data export cleanup #4300
Conversation
📝 WalkthroughWalkthroughThis pull request introduces significant refactoring across multiple files in the analytics API, focusing on consolidating and standardizing the handling of sites and devices. The changes primarily involve renaming parameters, updating SQL query constructions, and creating a unified filtering approach for sites and devices. The modifications aim to improve code clarity, reduce redundancy, and enhance the consistency of data retrieval and filtering methods. Changes
Possibly related PRs
Suggested Reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (3)
src/analytics/api/utils/data_formatters.py (1)
293-304
: Enhance type hints and documentation.While the function signature and docstring are clear, they could be more detailed:
-def filter_non_private_sites_devices( - filter_type: str, filter_value: List[str] -) -> Dict[str, Any]: +def filter_non_private_sites_devices( + filter_type: Literal["devices", "sites"], filter_value: List[str] +) -> Dict[str, Union[str, List[str], bool]]: """ Filters out private device/site IDs from a provided array of device IDs. Args: - filter_type(str): A string either devices or sites. - filter_value(List[str]): List of device/site ids to filter against. + filter_type(Literal["devices", "sites"]): The type of IDs to filter ("devices" or "sites"). + filter_value(List[str]): List of device/site IDs to filter against. Returns: - A response dictionary object that contains a list of non-private device/site ids if any. + Dict[str, Union[str, List[str], bool]]: A response dictionary containing: + - status (str): "success" or "error" + - data (List[str]): List of non-private device/site IDs + - success (bool): True if operation was successful + + Example: + >>> result = filter_non_private_sites_devices("devices", ["device1", "device2"]) + >>> print(result) + {"status": "success", "data": ["device1"], "success": True} """src/analytics/api/models/events.py (2)
44-57
: Enhance constructor documentation.While the change from tenant to network is good, the docstring could be more detailed:
def __init__(self, network): """ Initializes the EventsModel with default settings and mappings for limit thresholds, and specifies collections and BigQuery table references. Args: - network (str): The network identifier for managing database collections. + network (str): The network identifier (e.g., 'airqo') used to: + - Manage database collections + - Filter data by network + - Route requests to the appropriate BigQuery tables + + Example: + >>> model = EventsModel(network="airqo") """
197-201
: Improve SQL query formatting for better readability.While the table reference changes are correct, the query formatting could be improved:
- f"{pollutants_query}, {time_grouping}, {self.device_info_query}, {self.devices_table}.name AS device_name " - f"FROM {data_table} " - f"JOIN {self.devices_table} ON {self.devices_table}.device_id = {data_table}.device_id " - f"WHERE {data_table}.timestamp BETWEEN '{start_date}' AND '{end_date}' " - f"AND {self.devices_table}.device_id IN UNNEST(@filter_value) " + f""" + {pollutants_query}, + {time_grouping}, + {self.device_info_query}, + {self.devices_table}.name AS device_name + FROM {data_table} + JOIN {self.devices_table} + ON {self.devices_table}.device_id = {data_table}.device_id + WHERE {data_table}.timestamp BETWEEN '{start_date}' AND '{end_date}' + AND {self.devices_table}.device_id IN UNNEST(@filter_value) + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/analytics/api/models/events.py
(7 hunks)src/analytics/api/utils/data_formatters.py
(2 hunks)src/analytics/api/views/dashboard.py
(8 hunks)src/analytics/api/views/data.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (2)
src/analytics/api/utils/data_formatters.py (1)
255-255
: LGTM! Consistent variable naming.The change from 'tenant' to 'network' aligns with the standardized terminology being used across the codebase.
src/analytics/api/models/events.py (1)
29-30
: LGTM! More specific table references.The changes to use more specific table names (
BIGQUERY_SITES_SITES
andBIGQUERY_DEVICES_DEVICES
) improve clarity and reduce potential confusion.
if len(filter_value) == 0: | ||
raise ValueError(f"{filter_type} can't be empty") | ||
|
||
endpoint: Dict = { | ||
"devices": "devices/cohorts/filterNonPrivateDevices", | ||
"sites": "devices/grids/filterNonPrivateSites", | ||
} | ||
|
||
endpoint: str = "devices/cohorts/filterNonPrivateDevices" | ||
try: | ||
airqo_requests = AirQoRequests() | ||
response = airqo_requests.request( | ||
endpoint=endpoint, body={filter_type: devices}, method="post" | ||
endpoint=endpoint.get(filter_type), | ||
body={filter_type: filter_value}, | ||
method="post", | ||
) | ||
if response and response.get("status") == "success": | ||
return airqo_requests.create_response( | ||
message="Successfully returned data.", | ||
data=response.get("data"), | ||
data=response.get("data", {}).get(filter_type, []), | ||
success=True, | ||
) | ||
else: | ||
return airqo_requests.create_response(response, success=False) | ||
except Exception as rex: | ||
logger.exception(f"Error while filtering non private devices {rex}") | ||
except Exception as e: | ||
logger.exception(f"Error while filtering non private {filter_type}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance input validation and error handling.
The implementation could be more robust with additional validations and consistent error handling:
+ ALLOWED_FILTER_TYPES = {"devices", "sites"}
+
+ if filter_type not in ALLOWED_FILTER_TYPES:
+ raise ValueError(f"filter_type must be one of {ALLOWED_FILTER_TYPES}")
+
if len(filter_value) == 0:
- raise ValueError(f"{filter_type} can't be empty")
+ raise ValueError(f"filter_value for {filter_type} cannot be empty")
endpoint: Dict = {
"devices": "devices/cohorts/filterNonPrivateDevices",
"sites": "devices/grids/filterNonPrivateSites",
}
try:
airqo_requests = AirQoRequests()
response = airqo_requests.request(
endpoint=endpoint.get(filter_type),
body={filter_type: filter_value},
method="post",
)
if response and response.get("status") == "success":
return airqo_requests.create_response(
message="Successfully returned data.",
data=response.get("data", {}).get(filter_type, []),
success=True,
)
- else:
- return airqo_requests.create_response(response, success=False)
+ return airqo_requests.create_response(
+ message=f"Failed to filter {filter_type}. Server response: {response}",
+ success=False
+ )
except Exception as e:
- logger.exception(f"Error while filtering non private {filter_type}: {e}")
+ error_msg = f"Error while filtering non-private {filter_type}: {str(e)}"
+ logger.exception(error_msg)
+ return airqo_requests.create_response(
+ message=error_msg,
+ success=False
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if len(filter_value) == 0: | |
raise ValueError(f"{filter_type} can't be empty") | |
endpoint: Dict = { | |
"devices": "devices/cohorts/filterNonPrivateDevices", | |
"sites": "devices/grids/filterNonPrivateSites", | |
} | |
endpoint: str = "devices/cohorts/filterNonPrivateDevices" | |
try: | |
airqo_requests = AirQoRequests() | |
response = airqo_requests.request( | |
endpoint=endpoint, body={filter_type: devices}, method="post" | |
endpoint=endpoint.get(filter_type), | |
body={filter_type: filter_value}, | |
method="post", | |
) | |
if response and response.get("status") == "success": | |
return airqo_requests.create_response( | |
message="Successfully returned data.", | |
data=response.get("data"), | |
data=response.get("data", {}).get(filter_type, []), | |
success=True, | |
) | |
else: | |
return airqo_requests.create_response(response, success=False) | |
except Exception as rex: | |
logger.exception(f"Error while filtering non private devices {rex}") | |
except Exception as e: | |
logger.exception(f"Error while filtering non private {filter_type}: {e}") | |
ALLOWED_FILTER_TYPES = {"devices", "sites"} | |
if filter_type not in ALLOWED_FILTER_TYPES: | |
raise ValueError(f"filter_type must be one of {ALLOWED_FILTER_TYPES}") | |
if len(filter_value) == 0: | |
raise ValueError(f"filter_value for {filter_type} cannot be empty") | |
endpoint: Dict = { | |
"devices": "devices/cohorts/filterNonPrivateDevices", | |
"sites": "devices/grids/filterNonPrivateSites", | |
} | |
try: | |
airqo_requests = AirQoRequests() | |
response = airqo_requests.request( | |
endpoint=endpoint.get(filter_type), | |
body={filter_type: filter_value}, | |
method="post", | |
) | |
if response and response.get("status") == "success": | |
return airqo_requests.create_response( | |
message="Successfully returned data.", | |
data=response.get("data", {}).get(filter_type, []), | |
success=True, | |
) | |
return airqo_requests.create_response( | |
message=f"Failed to filter {filter_type}. Server response: {response}", | |
success=False | |
) | |
except Exception as e: | |
error_msg = f"Error while filtering non-private {filter_type}: {str(e)}" | |
logger.exception(error_msg) | |
return airqo_requests.create_response( | |
message=error_msg, | |
success=False | |
) |
sites = filter_non_private_sites_devices( | ||
filter_type="sites", filter_value=json_data.get("sites", {}) | ||
).get("data", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the default value for sites filter.
Using an empty dict as the default value for json_data.get("sites", {})
is incorrect as the function expects a list:
- sites = filter_non_private_sites_devices(
- filter_type="sites", filter_value=json_data.get("sites", {})
- ).get("data", [])
+ sites = filter_non_private_sites_devices(
+ filter_type="sites", filter_value=json_data.get("sites", [])
+ ).get("data", [])
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
sites = filter_non_private_sites_devices( | |
filter_type="sites", filter_value=json_data.get("sites", {}) | |
).get("data", []) | |
sites = filter_non_private_sites_devices( | |
filter_type="sites", filter_value=json_data.get("sites", []) | |
).get("data", []) |
devices = filter_non_private_sites_devices( | ||
filter_type="devices", filter_value=json_data.get("devices", {}) | ||
).get("data", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the default value for devices filter.
Using an empty dict as the default value for json_data.get("devices", {})
is incorrect as the function expects a list:
- devices = filter_non_private_sites_devices(
- filter_type="devices", filter_value=json_data.get("devices", {})
- ).get("data", [])
+ devices = filter_non_private_sites_devices(
+ filter_type="devices", filter_value=json_data.get("devices", [])
+ ).get("data", [])
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
devices = filter_non_private_sites_devices( | |
filter_type="devices", filter_value=json_data.get("devices", {}) | |
).get("data", []) | |
devices = filter_non_private_sites_devices( | |
filter_type="devices", filter_value=json_data.get("devices", []) | |
).get("data", []) |
sites = filter_non_private_sites_devices( | ||
filter_type="sites", filter_value=json_data.get("sites", {}) | ||
).get("data", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the default value for sites filter.
Using an empty dict as the default value for json_data.get("sites", {})
is incorrect as the function expects a list:
- sites = filter_non_private_sites_devices(
- filter_type="sites", filter_value=json_data.get("sites", {})
- ).get("data", [])
+ sites = filter_non_private_sites_devices(
+ filter_type="sites", filter_value=json_data.get("sites", [])
+ ).get("data", [])
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
sites = filter_non_private_sites_devices( | |
filter_type="sites", filter_value=json_data.get("sites", {}) | |
).get("data", []) | |
sites = filter_non_private_sites_devices( | |
filter_type="sites", filter_value=json_data.get("sites", []) | |
).get("data", []) |
devices = filter_non_private_sites_devices( | ||
filter_type="devices", filter_value=json_data.get("devices", {}) | ||
).get("data", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the default value for devices filter.
Using an empty dict as the default value for json_data.get("devices", {})
is incorrect as the function expects a list:
- devices = filter_non_private_sites_devices(
- filter_type="devices", filter_value=json_data.get("devices", {})
- ).get("data", [])
+ devices = filter_non_private_sites_devices(
+ filter_type="devices", filter_value=json_data.get("devices", [])
+ ).get("data", [])
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
devices = filter_non_private_sites_devices( | |
filter_type="devices", filter_value=json_data.get("devices", {}) | |
).get("data", []) | |
devices = filter_non_private_sites_devices( | |
filter_type="devices", filter_value=json_data.get("devices", []) | |
).get("data", []) |
sites = filter_non_private_sites_devices( | ||
filter_type="sites", filter_value=json_data.get("sites", {}) | ||
).get("data", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the default value for sites filter.
Using an empty dict as the default value for json_data.get("sites", {})
is incorrect as the function expects a list:
- sites = filter_non_private_sites_devices(
- filter_type="sites", filter_value=json_data.get("sites", {})
- ).get("data", [])
+ sites = filter_non_private_sites_devices(
+ filter_type="sites", filter_value=json_data.get("sites", [])
+ ).get("data", [])
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
sites = filter_non_private_sites_devices( | |
filter_type="sites", filter_value=json_data.get("sites", {}) | |
).get("data", []) | |
sites = filter_non_private_sites_devices( | |
filter_type="sites", filter_value=json_data.get("sites", []) | |
).get("data", []) |
validated_value = filter_non_private_sites_devices( | ||
filter_type, filter_value | ||
) | ||
elif filter_type in devices: | ||
validated_value = filter_non_private_devices(filter_type, filter_value) | ||
validated_value = filter_non_private_sites_devices( | ||
filter_type, filter_value | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add filter type validation for device registry calls.
The filter_type is passed directly from the provided_filters list without ensuring it matches the expected values ("devices" or "sites"):
+ def normalize_filter_type(filter_type: str) -> str:
+ if filter_type in sites:
+ return "sites"
+ if filter_type in devices:
+ return "devices"
+ return filter_type
if filter_type in sites:
validated_value = filter_non_private_sites_devices(
- filter_type, filter_value
+ normalize_filter_type(filter_type), filter_value
)
elif filter_type in devices:
validated_value = filter_non_private_sites_devices(
- filter_type, filter_value
+ normalize_filter_type(filter_type), filter_value
)
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
187-194: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
sites = filter_non_private_sites_devices( | ||
filter_type="sites", filter_value=json_data.get("sites", {}) | ||
).get("data", []) | ||
devices = filter_non_private_sites_devices( | ||
filter_type="devices", filter_value=json_data.get("devices", {}) | ||
).get("data", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the default values for sites and devices filters.
Using empty dicts as default values for json_data.get("sites", {})
and json_data.get("devices", {})
is incorrect as the function expects lists:
- sites = filter_non_private_sites_devices(
- filter_type="sites", filter_value=json_data.get("sites", {})
- ).get("data", [])
- devices = filter_non_private_sites_devices(
- filter_type="devices", filter_value=json_data.get("devices", {})
- ).get("data", [])
+ sites = filter_non_private_sites_devices(
+ filter_type="sites", filter_value=json_data.get("sites", [])
+ ).get("data", [])
+ devices = filter_non_private_sites_devices(
+ filter_type="devices", filter_value=json_data.get("devices", [])
+ ).get("data", [])
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
sites = filter_non_private_sites_devices( | |
filter_type="sites", filter_value=json_data.get("sites", {}) | |
).get("data", []) | |
devices = filter_non_private_sites_devices( | |
filter_type="devices", filter_value=json_data.get("devices", {}) | |
).get("data", []) | |
sites = filter_non_private_sites_devices( | |
filter_type="sites", filter_value=json_data.get("sites", []) | |
).get("data", []) | |
devices = filter_non_private_sites_devices( | |
filter_type="devices", filter_value=json_data.get("devices", []) | |
).get("data", []) |
validated_value = filter_non_private_sites_devices( | ||
filter_type, filter_value | ||
) | ||
elif filter_type in devices: | ||
validated_value = filter_non_private_devices(filter_type, filter_value) | ||
validated_value = filter_non_private_sites_devices( | ||
filter_type, filter_value | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add filter type validation for device registry calls.
The filter_type is passed directly without ensuring it matches the expected values ("devices" or "sites"):
+ def normalize_filter_type(filter_type: str) -> str:
+ if filter_type in sites:
+ return "sites"
+ if filter_type in devices:
+ return "devices"
+ return filter_type
if filter_type in sites:
validated_value = filter_non_private_sites_devices(
- filter_type, filter_value
+ normalize_filter_type(filter_type), filter_value
)
elif filter_type in devices:
validated_value = filter_non_private_sites_devices(
- filter_type, filter_value
+ normalize_filter_type(filter_type), filter_value
)
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
243-250: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
Description
Just some cleanup
Summary by CodeRabbit
Refactor
tenant
tonetwork
across multiple filesBug Fixes
Chores