Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(otaclient v3.8.x): Add timeout when downloading "metadata.jwt" and *.pem files #501

Merged
merged 4 commits into from
Mar 24, 2025

Conversation

Zhenfeng-Sun
Copy link

@Zhenfeng-Sun Zhenfeng-Sun commented Mar 18, 2025

Related JIRA: RT4-15059

About the fix

At the beginning of OTA process, we will download "medata.jwt" file and then "certificate" pem file.
There are cases that these files can not be downloaded correctly.

For example:
1, ECU network config is not correct.
2, The certificate file is not correct
etc...

When it happens, current behavior is that "otaclient" keep retry endlessly.
For the end user, he will not be aware of the error and will likely keep waiting for a long time.

This fix will timeout the retrying process after 5 mins.
End user will be informed OTA is not successful after that.

Copy link
Contributor

github-actions bot commented Mar 18, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/ota_metadata/legacy
   __init__.py110100% 
   parser.py3464287%104, 163, 168, 204–205, 215–216, 219, 231, 289, 299–302, 341–344, 424, 427, 435–437, 450, 459–460, 463–464, 679–680, 690, 692, 695–696, 726–729, 779, 782–784
   types.py841384%37, 40–42, 112–116, 122–125
src/ota_proxy
   __init__.py361072%59, 61, 63, 72, 81–82, 102, 104–106
   __main__.py770%16–18, 20, 22–23, 25
   _consts.py170100% 
   cache_control_header.py68494%71, 91, 113, 121
   cache_streaming.py1441390%211, 225, 229–230, 265–266, 268, 280, 349, 367–370
   config.py170100% 
   db.py741875%110, 116, 154, 160–161, 164, 170, 172, 193–200, 202–203
   errors.py50100% 
   lru_cache_helper.py47295%84–85
   ota_cache.py2286173%70–71, 140, 151–152, 184–185, 202, 239–243, 247–249, 251, 253–260, 262–264, 267–268, 272–273, 277, 324, 332–334, 407, 434, 437–438, 460–462, 466–468, 474, 476–478, 483, 509–511, 546–548, 575, 581, 596
   server_app.py1413972%79, 82, 88, 107, 111, 170, 179, 221–222, 224–226, 229, 234–235, 238, 241–242, 245, 248, 251, 254, 267–268, 271–272, 274, 277, 303–306, 309, 323–325, 331–333
   utils.py140100% 
src/otaclient
   __init__.py5260%17, 19
   __main__.py110%16
   log_setting.py52590%53, 55, 64–66
src/otaclient/app
   __main__.py110%16
   configs.py760100% 
   errors.py1200100% 
   interface.py30100% 
   main.py46589%52–53, 75–77
   ota_client.py38111569%80, 88, 109, 136, 138–139, 142–143, 145–146, 150, 154–155, 160–161, 167, 169, 207–210, 216, 220, 226, 345, 357–358, 360, 369, 372, 377–378, 381, 387, 389–393, 412–415, 418–429, 457–460, 506–507, 511, 513–514, 544–545, 554–561, 568, 571–577, 622–625, 633, 669–671, 676–678, 681–682, 684–685, 687, 745–746, 749, 757–758, 761, 772–773, 776, 784–785, 788, 799, 818, 845, 864, 882
   ota_client_stub.py39310972%75–77, 79–80, 88–91, 94–96, 100, 105–106, 108–109, 112, 114–115, 118–120, 123–124, 127–129, 134–139, 143, 146–150, 152–153, 161–163, 166, 203–205, 210, 246, 271, 274, 277, 381, 405, 407, 431, 477, 534, 604–605, 644, 663–665, 671–674, 678–680, 687–689, 692, 696–699, 752, 841–843, 850, 880–881, 884–888, 897–906, 913, 919, 922–923, 927, 930
   update_stats.py104991%57, 103, 105, 114, 116, 125, 127, 148, 179
src/otaclient/boot_control
   __init__.py40100% 
   _common.py24811254%74–75, 96–98, 114–115, 135–136, 155–156, 175–176, 195–196, 218–220, 235–236, 260–266, 287, 295, 313, 321, 340–341, 344–345, 368, 370–379, 381–390, 392–394, 413, 416, 424, 432, 448–450, 452–457, 550, 555, 560, 673, 677–678, 681, 689, 691–692, 718–719, 721–724, 729, 735–736, 739–740, 742, 749–750, 761–767, 777–779, 783–784, 787–788, 791, 797
   _firmware_package.py942276%83, 87, 137, 181, 187, 210–211, 214–219, 221–222, 225–230, 232
   _grub.py41712869%217, 265–268, 274–278, 315–316, 323–328, 331–337, 340, 343–344, 349, 351–353, 362–368, 370–371, 373–375, 384–386, 388–390, 469–470, 474–475, 527, 533, 559, 581, 585–586, 601–603, 627–630, 642, 646–648, 650–652, 711–714, 739–742, 765–768, 780–781, 784–785, 820, 826, 846–847, 849, 861, 864, 867, 870, 874–876, 894–897, 925–928, 933–941, 946–954
   _jetson_cboot.py2622620%20, 22–25, 27–29, 35–38, 40–41, 57–58, 60, 62–63, 69, 73, 132, 135, 137–138, 141, 148–149, 157–158, 161, 167–168, 176, 185–189, 191, 197, 200–201, 207, 210–211, 216–217, 219, 225–226, 229–230, 233–235, 237, 243, 248–250, 252–254, 259, 261–264, 266–267, 276–277, 280–281, 286–287, 290–294, 297–298, 303–304, 307, 310–314, 319–322, 325, 328–329, 332, 335–336, 339, 343–348, 352–353, 357, 360–361, 364, 367–370, 372, 375–376, 380, 383, 386–389, 391, 398, 402–403, 406–407, 413–414, 420, 422–423, 427, 429, 431–433, 436, 440, 443, 446–447, 449, 452, 460–461, 468, 478, 481, 489–490, 495–498, 500, 507, 509–511, 517–518, 522–523, 526, 530, 533, 535, 542–546, 548, 560–563, 566, 569, 571, 578, 582–583, 585–586, 588–590, 592, 594, 597, 600, 603, 605–606, 609–613, 617–619, 621, 629–633, 635, 638, 642, 645, 656–657, 662, 672, 675–683, 687–696, 700–709, 713, 715–717, 719–720, 722–723
   _jetson_common.py1724573%132, 140, 288–291, 294, 311, 319, 354, 359–364, 382, 408–409, 411–413, 417–420, 422–423, 425–429, 431, 438–439, 442–443, 453, 456–457, 460, 462, 506–507
   _jetson_uefi.py39727131%127–129, 134–135, 154–156, 161–164, 331, 449, 451–454, 458, 462–463, 465–473, 475, 487–488, 491–492, 495–496, 499–501, 505–506, 511–513, 517, 521–522, 525–526, 529–530, 534, 537–538, 540, 545–546, 550, 553–554, 559, 563–565, 569–571, 573, 577–580, 582–583, 605–606, 610–611, 613, 617, 621–622, 625–626, 633, 636–638, 641, 643–644, 649–650, 653–656, 658–659, 664, 666–667, 675, 678–681, 683–684, 686, 690–691, 695, 703–707, 710–711, 713, 716–720, 723, 726–730, 734–735, 738–743, 746–747, 750–753, 755–756, 763–764, 774–777, 780, 783–786, 789–793, 796–797, 800, 803–806, 809, 811, 816–817, 820, 823–826, 828, 834, 839–840, 859–860, 863, 871–872, 879, 889, 892, 899–900, 905–908, 916–919, 927–928, 940–943, 945, 948, 951, 959, 970–972, 974–976, 978–982, 987–988, 990, 1003, 1007, 1010, 1020, 1025, 1033–1034, 1037, 1041, 1043–1045, 1051–1052, 1057, 1065–1072, 1077–1085, 1090–1098, 1104–1106
   _rpi_boot.py29514152%56, 59, 123–124, 128, 136–139, 153–156, 163–164, 166–167, 172–173, 176–177, 186–187, 225, 231–235, 238, 256–258, 262–264, 269–271, 275–277, 287–288, 291, 294, 296–297, 299–300, 303, 305–307, 309–311, 317, 320–321, 331–334, 342–347, 349, 351–352, 357–358, 365–371, 402, 404–407, 417–420, 424, 427–428, 430–435, 463–466, 485–488, 493, 496, 514–517, 522–530, 535–543, 560–563, 569–571, 574, 577
   configs.py550100% 
   protocol.py40100% 
   selecter.py412929%45–47, 50–51, 55–56, 59–61, 64, 66, 70, 78–80, 82–83, 85–86, 90, 92, 94–95, 97, 99–100, 102, 104
src/otaclient/configs
   _common.py80100% 
   ecu_info.py58198%108
   proxy_info.py52296%88, 90
src/otaclient/create_standby
   __init__.py12558%29–31, 33, 35
   common.py2244480%62, 65–66, 70–72, 74, 78–79, 81, 127, 175–177, 179–181, 183, 186–189, 193, 204, 278–279, 281–286, 298, 335, 363, 366–368, 384–385, 399, 403, 425–426
   interface.py50100% 
   rebuild_mode.py97990%93–95, 107–112
src/otaclient_api/v2
   api_caller.py39684%45–47, 83–85
   api_stub.py170100% 
   types.py2562391%86, 89–92, 131, 209–210, 212, 259, 262–263, 506–508, 512–513, 515, 518–519, 522–523, 586
src/otaclient_common
   __init__.py34876%42–44, 61, 63, 69, 76–77
   common.py1561888%47, 202, 205–207, 222, 229–231, 297–299, 309, 318–320, 366, 370
   downloader.py1991094%107–108, 126, 153, 369, 424, 428, 516–517, 526
   linux.py611575%51–53, 59, 69, 74, 76, 108–109, 133–134, 190, 195–196, 198
   logging.py29196%55
   persist_file_handling.py1181884%113, 118, 150–152, 163, 192–193, 228–232, 242–244, 246–247
   proto_streamer.py42880%33, 48, 66–67, 72, 81–82, 100
   proto_wrapper.py3984887%87, 165, 172, 184–186, 205, 210, 221, 257, 263, 268, 299, 303, 307, 402, 462, 469, 472, 492, 499, 501, 526, 532, 535, 537, 562, 568, 571, 573, 605, 609, 611, 625, 642, 669, 672, 676, 707, 713, 760–763, 765, 803–805
   retry_task_map.py105595%158–159, 161, 181–182
   typing.py25388%69–70, 72
TOTAL6345169073% 

Tests Skipped Failures Errors Time
217 0 💤 0 ❌ 0 🔥 13m 27s ⏱️

@Zhenfeng-Sun Zhenfeng-Sun requested review from Bodong-Yang and a team March 18, 2025 02:59
@@ -659,7 +660,8 @@ def _process_metadata_jwt(self) -> _MetadataJWTClaimsLayout:
with NamedTemporaryFile(prefix="metadata_jwt", dir=self.run_dir) as meta_f:
_downloaded_meta_f = Path(meta_f.name)

while not _shutdown:
_retry_counter = 0
while _retry_counter < cfg.DOWNLOAD_GROUP_INACTIVE_TIMEOUT:
Copy link
Collaborator

@airkei airkei Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this will be _retry_counter * self.retry_interval < cfg.DOWNLOAD_GROUP_INACTIVE_TIMEOUT
    (or _retry_counter += self.retry_interval at line.693)
  • can we remove not _shutdown condition? I think the condition will be while (not _shutdown) and <<new condition>>.

Copy link
Collaborator

@airkei airkei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected behavior when timeout is occurred? Currently, seems trying to continue metadata parsing even though the download is timeout.

@Bodong-Yang Bodong-Yang changed the title fix(otaclient): Add timeout when downloading "metadata.jwt" and *.pem files fix(v3.8.x): Add timeout when downloading "metadata.jwt" and *.pem files Mar 18, 2025
@Bodong-Yang Bodong-Yang changed the title fix(v3.8.x): Add timeout when downloading "metadata.jwt" and *.pem files fix(otaclient v3.8.x): Add timeout when downloading "metadata.jwt" and *.pem files Mar 18, 2025
@Zhenfeng-Sun
Copy link
Author

@airkei

What is the expected behavior when timeout is occurred? Currently, seems trying to continue metadata parsing even though the download is timeout.

After 5 min, I see this log

Mar 18 15:16:52 autoware-ecu python3[5205]: Traceback (most recent call last):
Mar 18 15:16:52 autoware-ecu python3[5205]: File "/opt/ota/client/venv/lib/python3.10/site-packages/otaclient/app/ota_client.py", line 661, in update
Mar 18 15:16:52 autoware-ecu python3[5205]: self._update_executor.execute()
Mar 18 15:16:52 autoware-ecu python3[5205]: File "/opt/ota/client/venv/lib/python3.10/site-packages/otaclient/app/ota_client.py", line 553, in execute
Mar 18 15:16:52 autoware-ecu python3[5205]: self._execute_update()
Mar 18 15:16:52 autoware-ecu python3[5205]: File "/opt/ota/client/venv/lib/python3.10/site-packages/otaclient/app/ota_client.py", line 415, in _execute_update
Mar 18 15:16:52 autoware-ecu python3[5205]: raise ota_errors.MetadataJWTVerficationFailed(
Mar 18 15:16:52 autoware-ecu python3[5205]: otaclient.app.errors.MetadataJWTVerficationFailed: failed to verify metadata.jwt: MetadataJWTVerificationFailed('failed to verify metadata against sign cert: Error([])')
Mar 18 15:16:52 autoware-ecu python3[5205]: ------ end of exception traceback ------
Mar 18 15:16:57 autoware-ecu python3[5205]: [2025-03-18 15:16:57,678][WARNING]-otaclient.app.ota_client_stub:_generate_overall_status_report:445,new failed ECU(s) detected: {'autoware'}, current self.failed_ecus_id={'autoware'}


This will tell FMS that the ECU is not successfully updated.

@airkei
Copy link
Collaborator

airkei commented Mar 18, 2025

Thank you. So _parser.verify_metadata raises the exception when there is no valid metadata.
We might be able to raise the exception without executing _parser.verify_metadata when there is no metadata file or download size is 0. I think current implementation is also acceptable.

Copy link
Collaborator

@airkei airkei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you 🚀

@Bodong-Yang
Copy link
Member

Bodong-Yang commented Mar 21, 2025

@airkei

What is the expected behavior when timeout is occurred? Currently, seems trying to continue metadata parsing even though the download is timeout.

After 5 min, I see this log

Mar 18 15:16:52 autoware-ecu python3[5205]: Traceback (most recent call last): Mar 18 15:16:52 autoware-ecu python3[5205]: File "/opt/ota/client/venv/lib/python3.10/site-packages/otaclient/app/ota_client.py", line 661, in update Mar 18 15:16:52 autoware-ecu python3[5205]: self._update_executor.execute() Mar 18 15:16:52 autoware-ecu python3[5205]: File "/opt/ota/client/venv/lib/python3.10/site-packages/otaclient/app/ota_client.py", line 553, in execute Mar 18 15:16:52 autoware-ecu python3[5205]: self._execute_update() Mar 18 15:16:52 autoware-ecu python3[5205]: File "/opt/ota/client/venv/lib/python3.10/site-packages/otaclient/app/ota_client.py", line 415, in _execute_update Mar 18 15:16:52 autoware-ecu python3[5205]: raise ota_errors.MetadataJWTVerficationFailed( Mar 18 15:16:52 autoware-ecu python3[5205]: otaclient.app.errors.MetadataJWTVerficationFailed: failed to verify metadata.jwt: MetadataJWTVerificationFailed('failed to verify metadata against sign cert: Error([])') Mar 18 15:16:52 autoware-ecu python3[5205]: ------ end of exception traceback ------ Mar 18 15:16:57 autoware-ecu python3[5205]: [2025-03-18 15:16:57,678][WARNING]-otaclient.app.ota_client_stub:_generate_overall_status_report:445,new failed ECU(s) detected: {'autoware'}, current self.failed_ecus_id={'autoware'}

This will tell FMS that the ECU is not successfully updated.

@Zhenfeng-Sun @airkei This behavior is not intended. You simply let the ota_metadata downloading break out the downloading after 5mins, let the code continue running to verification part, and let ota_metadata validation validating nothing, finally raises an unrelated error MetadataJWTVerficationFailed.

You should let the ota_metadata downloading part raises something like MetadataDownloadingFailed exception, and let the ota_client.py handles this MetadataDownloadingFailed, and logging the situation.

@Bodong-Yang
Copy link
Member

Bodong-Yang commented Mar 21, 2025

Please refine your changes, this PR is not yet ready for merging 🙏
( I temporarily add don't merge label, until the PR is refined.

@Bodong-Yang Bodong-Yang added refactor Rewrite/remove related code instead of patching them don't merge labels Mar 21, 2025
Copy link
Collaborator

@airkei airkei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the following point.

You should let the ota_metadata downloading part raises something like MetadataDownloadingFailed exception, and let the ota_client.py handles this MetadataDownloadingFailed, and logging the situation.

@Zhenfeng-Sun Zhenfeng-Sun merged commit ce95f29 into v3.8.x Mar 24, 2025
8 checks passed
@Zhenfeng-Sun Zhenfeng-Sun deleted the Fix-metadata-download-time-issue branch March 24, 2025 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Rewrite/remove related code instead of patching them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants