Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Updates for Metrics release 1.3.0 #118

Merged
merged 55 commits into from
Jun 28, 2024
Merged

chore: Updates for Metrics release 1.3.0 #118

merged 55 commits into from
Jun 28, 2024

Conversation

melbeltagy
Copy link
Contributor

  • Ability to disable certificate verification during connecting to Central Server in collector module
  • Corrector optimization by using multiprocessing in other steps similar to raw data processing
  • Reports query optimization using index hints
  • Adding support for CSV reports
  • Fix duplicate logging
  • Update vulnerable packages

Refs: OPMONDEV-185

raits and others added 30 commits October 4, 2023 14:39
* chore: fix the incorrect name in sample settings.yaml
* chore: improve documentation example
chore: Fixes #99 incorrect filename in docs
allowed setting the page size and searching in results as these are now
supported in the API
chore: removed UI elements under the opendata module web interface that allowed setting the page size and searching in results as these are now supported in the API
feat: Ability to disable certificate verification during connecting to CS. 
docs: update README and collector docs.
fix: fix anonymizer failing tests.

Refs: OPMONDEV-181
Currently check `handler is WatchedFileHandler` always fails and a new handler is added every time _setup_logger is called.

For example corrector logs every line three times.

A fix for opendata was added with pull request #14

This commit makes the same change in all the other services.
Performing sanitise_document and correct_structure in worker threads instead of main thread. Computations in main thread can use only one CPU core and therefore negatively impact corrector throughput.
Deleting duplicates in threads instead of returning to_remove_queue and slowly processing that in main thread

Deleting code that was broken and no longer used after corrector started matching documents by xRequestId:
* Removing check if document marked as duplicate exists in clean_data because duplicates get deleted in any case
* Removing special handling for duplicate documents without requestInTs

Removed addition of deleted raw documents count to the total number of documents processed as these were already part of processed batch
Corrector did not check if processed document already exists in clean_data after corrector started matching documents by xRequestId.

Adding duplicate detection.
Using multiprocessing for updating status of orphans that reached timeout.

Fetching only document ids instead of full documents.
Using multiprocessing for processing of documents without xRequestId.

Adding documents without xRequestId to total number of documents processed.
Corrector is currently using slow and deprecated (mozilla/bleach#698) bleach. Based on the fact that X-Road metrics should not contain HTML it would be more beneficial to just use python translate method and replace potentially dangerous HTML characters. Translate does not parse html and estimated to be 100 times faster than bleach.

Using translate method instead of bleach.clean.

Renaming sanitise -> sanitize to be consistent with the rest of the code.
* MongoDB sometimes selects incorrect index for query
* Hint helps to avoid unnecessarily slow queries
As X-Road usually has more service clients than producers, using client subsystem code based index returns fewer rows and performs better
Avoiding unnecessary DB requests to find the list of documents where client and service is the same subsystem. This info is computable from document data.
Additionally, fixing invalid duplicate detection when client and producer requestInTs are in different report periods. "get_faulty_documents" did not find duplicates in that case.
Reports query optimization using index hints
Some users want to process report data, but generated PDF is not machine-readable.
CSV format can be easily imported into spreadsheets.
Adding optional configuration parameter for CSV generation.
Adding REPORT_NAME_NO_EXT variable for email template.
Adding support for CSV reports
melbeltagy and others added 25 commits June 10, 2024 23:20
Refs: OPMONDEV-182
…and ignore report from git

Refs: OPMONDEV-182
…onsistency and readability

Refs: OPMONDEV-182
Refs: OPMONDEV-182
chore: Update dependencies, apply fixes (code, and tox), and update docs
Bumps the actions-update group in /.github/workflows with 2 updates: [actions/checkout](https://github.com/actions/checkout) and [actions/setup-python](https://github.com/actions/setup-python).


Updates `actions/checkout` from 3 to 4
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v3...v4)

Updates `actions/setup-python` from 4 to 5
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-update
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-update
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the python-minor-patch group with 1 update in the /corrector_module directory: [freezegun](https://github.com/spulec/freezegun).


Updates `freezegun` from 1.0.0 to 1.5.1
- [Release notes](https://github.com/spulec/freezegun/releases)
- [Changelog](https://github.com/spulec/freezegun/blob/master/CHANGELOG)
- [Commits](spulec/freezegun@1.0.0...1.5.1)

---
updated-dependencies:
- dependency-name: freezegun
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-minor-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Refs: OPMONDEV-185
…ns/dot-github/workflows/actions-update-f039b2dc45
@melbeltagy melbeltagy requested a review from raits June 28, 2024 09:36
Copy link

Copy link
Contributor

@raits raits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@melbeltagy melbeltagy merged commit 3620084 into master Jun 28, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants