[202205] Backport: Rename the platform_reboot to the pre_reboot_hook, remove the sysfs power cycle #21678

jianyuewu · 2025-02-08T02:17:47Z

Backport commit #18324.

Why I did it

Back port graceful reboot instead of the sysfs power cycle to avoid filesystem corruption.

How I did it

Rename the platform_reboot script to the pre_reboot_hook. Remove the sysfs power cycle function, from now on the Debian reboot (/sbin/reboot) will be executed instead of the sysfs power cycle.

How to verify it

Start watching logs by using show log -f and journalctl -p debug -f Execute the reboot command from the switch CLI
Check in logs that all systemd services terminated

Which release branch to backport (provide reason below if selected)

[x] 202205
Because 202205 branch is missing this change, which need to be updated.

Tested branch (Please provide the tested image version)

…atically (sonic-net#15809) src/sonic-utilities * 20853a6f - (HEAD -> 202205, origin/202205) Revert "[GCU Feature Update] Cherry-pick Platform Validator PR to 202205 (sonic-net#2883)" (sonic-net#2908) (6 hours ago) [isabelmsft]

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>

#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28

…D automatically (sonic-net#15832) src/sonic-platform-daemons * bef58aa - (HEAD -> 202205, origin/202205) Added PCIe transaction check for all peripherals on the bus (sonic-net#331) (10 hours ago) [Ashwin Srinivasan]

…lly (sonic-net#15834) src/sonic-swss * 493f66f5 - (HEAD -> 202205, origin/202205) Remove redundant updateFabricPortState (sonic-net#2850) (6 hours ago) [kenneth-arista]

…tically (sonic-net#15833) src/sonic-sairedis * 56aee6c - (HEAD -> 202205, origin/202205) [202205] Advance SAI submodule to fix issue sonic-net#14706 (sonic-net#1265) (10 hours ago) [abdosi]

…atically (sonic-net#15835) src/sonic-utilities * bc7c7929 - (HEAD -> 202205, origin/202205) Add FEC correctable and uncorrectable port stats (sonic-net#2027) (10 hours ago) [Prince George] * 58db48ad - [show][muxcable] update `show mux tunnel-route` to check soc_ipv6 as well (10 hours ago) [Jing Zhang] * 24fc1db8 - [dualtor][route_check] filter out `soc_ipv6` (sonic-net#2899) (10 hours ago) [Jing Zhang] * d89d4832 - [route_check][dualtor] Ignore vlan neighbor route miss (sonic-net#2888) (10 hours ago) [Longxiang Lyu]

…start when change to local mode (sonic-net#15432) (sonic-net#15839) Why I did it During the upgrade process via k8s, the feature's systemd service will restart as well, all of the feature systemd service has restart number limit, and the limit number is too small, only three times. if fallback happens when upgrade, the start count will be 2, just once again, the systemd service will be down. So, need to bypass this. This restart function will be called when do local -> kube, kube -> kube, kube ->local, each time call this function, we indeed need to restart successfully, so do reset-failed every time we do restart. When need to go back to local mode, we do systemd restart immediately without waiting the default restart interval time so that we can reduce the container down time. Work item tracking Microsoft ADO (number only): 24172368 How I did it Before every restart for upgrade, do reset feature's restart number. The restart number will be reset to 0 to bypass the restart limit. When need to go back to local mode, we do systemd restart immediately. How to verify it Feature's systemd service can be always restarted successfully during upgrade process via k8s.

…et#15840) Why I did it When do clean up container images, current code has two bugs need to be fixed. And some variables' name maybe cause confused, change the variables' name. Work item tracking Microsoft ADO (number only): 24502294 How I did it We do clean up after tag latest successfully. But currently tag latest function only return 0 and 1, 0 means succeed and 1 means failed, when we get 1, we will retry, when we get 0, we will do clean up. Actually the code 0 includes another case we don't need to do clean up. The case is that when we are doing tag latest, the container image we want to tag maybe not running, so we can not tag latest and don't need to cleanup, we need to separate this case from 0, return -1 now. When local mode(v1) -> kube mode(v2) happens, one problem is how to handle the local image, there are two cases. one case is that there was one kube v1 container dry-run(cause we don't relace the local if kube version = local version), we will remove the kube v1 image and tag the local version with ACR prefix and remove local v1 local tag. Another case is that there was no kube v1 container dry-run, we remove the local v1 image directly, cause the local v1 image should not be the last desire version. About the docker_id variable, it may cause confused, it's actually docker image id, so rename the variable. About the two dicts and the list, rename them to be more readable. How to verify it Check tag latest and image clean up result.

…onic-net#15487) (sonic-net#15826) Modify snmpd.conf to start snmpd to listen on specific management and loopback ips instead of listening on any ip. #### Why I did it SNMP over IPv6 is not working for all scenarios for a single asic platforms. The expectation is that SNMP query over IPv6 should work over Management or Loopback0 addresses. **Specific scenario where this issue is seen** In case of Lab T0 device, when SNMP request is sent from a directly connected T1 neighbor over Loopback IP, SNMP response was not received. This was because the SRC IP address in SNMP response was not Loopback IP, it was the PortChannel IP connected to the neighboring device. ``` 23:18:51.620897 In 22:26:27:e6:e0:07 ethertype IPv6 (0x86dd), length 105: fc00::72.41725 > **fc00:1::32**.161: C="msft" **GetRequest**(28) .1.3.6.1.2.1.1.1.0 23:18:51.621441 Out 28:99:3a:a0:97:30 ethertype IPv6 (0x86dd), length 241: **fc00::71**.161 > fc00::72.41725: C="msft" **GetResponse**(162) .1.3.6.1.2.1.1.1.0="SONiC Software Version: SONiC.xxx - HwSku: xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64" ``` In case of IPv4, the SRC IP in SNMP response was correctly set to Loopback IP. ``` 23:25:32.769712 In 22:26:27:e6:e0:07 ethertype IPv4 (0x0800), length 85: 10.0.0.57.56701 > **10.1.0.32**.161: C="msft" **GetRequest**(28) .1.3.6.1.2.1.1.1.0 23:25:32.975967 Out 28:99:3a:a0:97:30 ethertype IPv4 (0x0800), length 221: **10.1.0.32**.161 > 10.0.0.57.56701: C="msft" **GetResponse**(162) .1.3.6.1.2.1.1.1.0="SONiC Software Version: SONiC.xxx - HwSku: xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64" ``` **Sequence of SNMP request and response** 1. SNMP request will be sent with SRC IP fc00::72 DST IP fc00:1::32 2. SNMP request is received at SONiC device is sent to snmpd which is listening on port 161 :::161/ 3. snmpd process will parse the request create a response and sent to DST IP fc00::72. snmpd process does not track the DST IP on which the SNMP request was received, which in this case is Loopback IP. snmpd process will only keep track what is tht IP to which the response should be sent to. 4. snmpd process will send the response packet. 5. Kernel will do a route look up on destination IP and find the best path. ip -6 route get fc00::72 fc00::72 from :: dev PortChannel101 proto kernel src fc00::71 metric 256 pref medium 5. Using the "src" ip from about, the response is sent out. This SRC ip is that of the PortChannel and not the device Loopback IP. The same issue is seen when SNMP query is sent from a remote server over Management IP. SONiC device eth0 --------- Remote server SNMP request comes with SRC IP <Remote_server> DST IP <Mgmt IP> If kernel finds best route to Remote_server_IP is via BGP neighbors, then it will send the response via front-panel interface with SRC IP as Loopback IP instead of Management IP. Main issue is that in case of IPv6, snmpd ignores the IP address to which SNMP request was sent, in case of IPv6. In case of IPv4, snmpd keeps track of DST IP of SNMP request, it will keep track if the SNMP request was sent to mgmt IP or Loopback IP. Later, this IP is used in ipi_spec_dst as SRC IP which helps kernel to find the route based on DST IP using the right SRC IP. https://github.com/net-snmp/net-snmp/blob/master/snmplib/transports/snmpUDPBaseDomain.c#L300 ipi.ipi_spec_dst.s_addr = srcip->s_addr Reference: https://man7.org/linux/man-pages/man7/ip.7.html ``` If IP_PKTINFO is passed to sendmsg(2) and ipi_spec_dst is not zero, then it is used as the local source address for the routing table lookup and for setting up IP source route options. When ipi_ifindex is not zero, the primary local address of the interface specified by the index overwrites ipi_spec_dst for the routing table lookup. ``` **This issue is not seen on multi-asic platform, why?** on multi-asic platform, there exists different network namespaces. SNMP docker with snmpd process runs on host namespace. Management interface belongs to host namespace. Loopback0 is configured on asic namespaces. Additional inforamtion on how the packet coming over Loopback IP reaches snmpd process running on host namespace: sonic-net#5420 Because of this separation of network namespaces, the route lookup of destination IP is confined to routing table of specific namespace where packet is received. if packet is received over management interface, SNMP response also is sent out of management interface. Same goes with packet received over Loopback Ip. ##### Work item tracking - Microsoft ADO **17537063**: #### How I did it Have snmpd listen on specific Management and Loopback IPs specifically instead of listening on any IP for single-asic platform. Before Fix ``` admin@xx:~$ sudo netstat -tulnp | grep 161 udp 0 0 0.0.0.0:161 0.0.0.0:* 15631/snmpd udp6 0 0 :::161 :::* 15631/snmpd ``` After fix ``` admin@device:~$ sudo netstat -tulnp | grep 161 udp 0 0 10.1.0.32:161 0.0.0.0:* 215899/snmpd udp 0 0 10.3.1.1:161 0.0.0.0:* 215899/snmpd udp6 0 0 fc00:1::32:161 :::* 215899/snmpd udp6 0 0 fc00:2::32:161 :::* 215899/snmpd ``` **How this change helps with the issue?** To see snmpd trace logs, modify snmpd to start using the below parameters, in supervisord.conf file ``` /usr/sbin/snmpd -f -LS0-7i -Lf /var/log/snmpd.log ``` When snmpd listens on any IP, snmpd binds to IPv4 and IPv6 sockets as below: ``` netsnmp_udpbase: binding socket: 7 to UDP: [0.0.0.0]:0->[0.0.0.0]:161 trace: netsnmp_udp6_transport_bind(): transports/snmpUDPIPv6Domain.c, 303: netsnmp_udpbase: binding socket: 8 to UDP/IPv6: [::]:161 ``` When IPv4 response is sent, it goes out of fd 7 and IPv6 response goes out of fd 8. When IPv6 response is sent, it does not have the right SRC IP and it can lead to the issue described. When snmpd listens on specific Loopback/Management IPs, snmpd binds to different sockets: ``` trace: netsnmp_udpipv4base_transport_bind(): transports/snmpUDPIPv4BaseDomain.c, 207: netsnmp_udpbase: binding socket: 7 to UDP: [0.0.0.0]:0->[10.250.0.101]:161 trace: netsnmp_udpipv4base_transport_bind(): transports/snmpUDPIPv4BaseDomain.c, 207: netsnmp_udpbase: binding socket: 8 to UDP: [0.0.0.0]:0->[10.1.0.32]:161 trace: netsnmp_register_agent_nsap(): snmp_agent.c, 1261: netsnmp_register_agent_nsap: fd 8 netsnmp_udpbase: binding socket: 10 to UDP/IPv6: [fc00:1::32]:161 trace: netsnmp_register_agent_nsap(): snmp_agent.c, 1261: netsnmp_register_agent_nsap: fd 10 netsnmp_ipv6: fmtaddr: t = (nil), data = 0x7fffed4c85d0, len = 28 trace: netsnmp_udp6_transport_bind(): transports/snmpUDPIPv6Domain.c, 303: netsnmp_udpbase: binding socket: 9 to UDP/IPv6: [fc00:2::32]:161 ``` When SNMP request comes in via Loopback IPv4, SNMP response is sent out of fd 8 ``` trace: netsnmp_udpbase_send(): transports/snmpUDPBaseDomain.c, 511: netsnmp_udp: send 170 bytes from 0x5581f2fbe30a to UDP: [10.0.0.33]:46089->[10.1.0.32]:161 on fd 8 ``` When SNMP request comes in via Loopback IPv6, SNMP response is sent out of fd 10 ``` netsnmp_ipv6: fmtaddr: t = (nil), data = 0x5581f2fc2ff0, len = 28 trace: netsnmp_udp6_send(): transports/snmpUDPIPv6Domain.c, 164: netsnmp_udp6: send 170 bytes from 0x5581f2fbe30a to UDP/IPv6: [fc00::42]:43750 on fd 10 ``` #### How to verify it Verified on single asic and multi-asic devices. Single asic SNMP query with Loopback ``` ARISTA01T1#bash snmpget -v2c -c xxx 10.1.0.32 1.3.6.1.2.1.1.1.0 SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: Arista-7260xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64 ARISTA01T1#bash snmpget -v2c -c xxx fc00:1::32 1.3.6.1.2.1.1.1.0 SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: Arista-7260xxx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64 ``` On multi-asic -- no change. ``` sudo netstat -tulnp | grep 161 udp 0 0 0.0.0.0:161 0.0.0.0:* 17978/snmpd udp6 0 0 :::161 :::* 17978/snmpd ``` Query result using Loopback IP from a directly connected BGP neighbor ``` ARISTA01T2#bash snmpget -v2c -c xxx 10.1.0.32 1.3.6.1.2.1.1.1.0 SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: xx - Distribution: Debian 9.13 - Kernel: 4.9.0-14-2-amd64 ARISTA01T2#bash snmpget -v2c -c xxx fc00:1::32 1.3.6.1.2.1.1.1.0 SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: xx - Distribution: Debian 9.13 - Kernel: 4.9.0-14-2-amd64 ```  Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>

…c-net#15844)

…ce is in kube mode (sonic-net#15642) Why I did it When sonic is managed by k8s, the sonic container is managed by k8s daemonset, daemonset identifies its members by labels. Currently when restarting a sonic service by systemctl, if the service's container is already managed by k8s, systemd script stops the container by removing the feature label to make it disjoin from k8s daemonset, and then starts it by adding the label to make it join k8s daemonset again. This behavior would cause problem during k8s container upgrade. Containers in daemonset are upgraded in a rolling fashion, that means the daemonset version is updated first, then rollout the new version to containers with precheck/postcheck one by one. However, if a sonic device joins a daemonset, k8s will directly deploy a pod with the current version of daemonset, it is expected when a device joins k8s cluster at first time. But for a device which has already joined k8s cluster, the re-joining daemonset will cause the container upgraded to new version without precheck, so if a systemd service is restarted during daemonset upgrade, the container may be upgraded without precheck and break rolling update policy. To fix it, we need to remove the logic about dropping k8s label in systemd service stop script for kube mode. Work item tracking Microsoft ADO (number only): 24304563 How I did it Don't drop label in systemd service stop script when feature's set_owner is kube. Only drop label when feature's set_owner is local. How to verify it The label feature_enabled should be always true if the feature's set owner is kube.

To pick the below fixes: DNX fixes: Temporarily revert fix for CS00012287482 - support for 1024 LAGs on DNX CS00012297599 - [J2C+] sonic-mgmt failure in test_copp.py (test_no_policer[BGP]) CS00012293560 - ECN remark issue in SONiC CS00012302371 - SONiC: V6 packets were mapped to wrong TC queue CS00012288540 - Available ACL Entry and Counter is incorrect after removing ACL rules Other changes (XGS fixes) SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470 SID - SIGSEGV in linkscan callback delivery - SDK-287578 SID - Repeated VXLAN calls deletes vlan translation action profile SDK-313980 SER - error in IS_TDM_CALENDAR0/1 can cause traffic hit in TH SID - L2_ENTRY Table Lookups May Miss [CSP CS00012275452] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER [CSP CS00012253527] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH

Graceful restart is a key event for bgpd, related log print is debug level. To change it to info level to get more visibilities when this kind of event is triggered.

…5890) (sonic-net#15907) Why I did it Fix the armhf build failure. How to reproduce the issue: docker run -it debain:bullseye bash apt-get update && apt-get install -y python3-pip pip3 install PyYAML==5.4.1 Error message: Collecting PyYAML==5.4.1 Installing build dependencies ... done Getting requirements to build wheel ... error ERROR: Command errored out with exit status 1: command: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl .... raise AttributeError(attr) AttributeError: cython_sources ---------------------------------------- WARNING: Discarding https://files.pythonhosted.org/packages/a0/a4/d63f2d7597e1a4b55aa3b4d6c5b029991d3b824b5bd331af8d4ab1ed687d/PyYAML-5.4.1.tar.gz#sha256=607774cbba28732bfa802b54baa7484215f530991055bb562efbed5b2f20a45e (from https://pypi.org/simple/pyyaml/) (requires-python:>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*). Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl Check the logs for full command output. ERROR: Could not find a version that satisfies the requirement PyYAML==5.4.1 ERROR: No matching distribution found for PyYAML==5.4.1 root@fa2fa92edcfd:/# But if adding the option --no-build-isolation, then it is good, see fix. install "PyYAML==5.4.1" --no-build-isolation The same error can be found in the multiple builds. Work item tracking Microsoft ADO (number only): 24567457 How I did it Add a build option --no-build-isolation.

…face (sonic-net#15881) sonic-build image side change to fix source interface selection in dual tor scenario. dhcprelay related PR: [master]fix dhcpv6 relay dual tor source interface selection issue sonic-dhcp-relay#42 Announce dhcprelay submodule to 6a6ce24 to include PR sonic-net#42

…lly (sonic-net#15887) src/sonic-swss * ed0ca898 - (HEAD -> 202205, origin/202205) Add missing parameter to on_switch_shutdown_request method. (sonic-net#2567) (34 hours ago) [Hua Liu]

…ically (sonic-net#15886) src/sonic-restapi * a69ba06 - (HEAD -> 202205, origin/master, origin/HEAD, origin/202205, master) [actions] Support Semgrep by Github Actions (sonic-net#144) (3 weeks ago) [Mai Bui] * 6b242a3 - [Ci] Upgrade python 2 to python 3 (sonic-net#145) (3 weeks ago) [xumia] * 1c50caa - prevent downcasting of 64-bit integer (sonic-net#142) (2 months ago) [Mai Bui] * de26989 - Use -race detector when building and testing (sonic-net#141) (3 months ago) [Lawrence Lee] * 9fe2eff - [go] Update Go to version 1.15 (sonic-net#140) (3 months ago) [Lawrence Lee]

56aee6c0dfc4705584764edd35b7b2977bb762eb (HEAD -> 202205, origin/202205) [202205] Advance SAI submodule to fix issue sonic-net#14706 (sonic-net#1265) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

…D automatically (sonic-net#15913) src/sonic-platform-daemons * 8147e25 - (HEAD -> 202205, origin/202205) Revert "Added PCIe transaction check for all peripherals on the bus (sonic-net#331)" (12 hours ago) [Ying Xie]

* Update WRED profile on system ports Co-authored-by: vmittal-msft <46945843+vmittal-msft@users.noreply.github.com>

…lly (sonic-net#15928) src/sonic-swss * 17c4d731 - (HEAD -> 202205, origin/202205) Remove system neighbor DEL operation in m_toSync if SET operation for (sonic-net#2853) (3 hours ago) [Song Yuan]

…atically (sonic-net#15930) src/sonic-utilities * 99864640 - (HEAD -> 202205, origin/202205) [dualtor] Add script to verify consistency between kernel and ASIC (sonic-net#2840) (87 minutes ago) [Longxiang Lyu] * a32ddc1b - [show][muxcable] update `show mux config` to print out `soc_ipv6` as well (sonic-net#2909) (88 minutes ago) [Jing Zhang] * 0c6d0c51 - [202205] Flush RESTAPI db in fast-reboot shutdown path (sonic-net#2921) (4 hours ago) [bingwang-ms]

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>

…test (sonic-net#15977) Why I did it Use remote PR test template from sonic-mgmt master to run PR test. How I did it Modify PR test azure pipeline yml file. How to verify it PR test executing normally. Signed-off-by: Chun'ang Li <chunangli@microsoft.com>

…net#18420) - This is primarily to incorporate ungraceful reboot logic to timely bring down front panel ports - Fix nokia_cmd show syntax and output, fix hw-management-generate-dump - Fix SFM hotplug serial number empty issue

…-net#17797) (sonic-net#18416) Currently, whenever isc-dhcp-relay forwards a packet upstream, internally, it will try to send it on a "fallback" interface. My understanding is that this isn't meant to be a real interface, but instead is basically saying to use Linux's regular routing stack to route the packet appropriately (rather than having isc-dhcp-relay specify specifically which interface to use). The problem is that on systems with a weak CPU, a large number of interfaces, and many upstream servers specified, this can introduce a noticeable delay in packets getting sent. The delay comes from trying to get the ifindex of the fallback interface. In one test case, it got to the point that only 2 packets could be processed per second. Because of this, dhcrelay will easily get backlogged and likely get to a point where packets get dropped in the kernel. Fix this by adding a check saying if we're using the fallback interface, then don't try to get the ifindex of this interface. We're never going to have an interface named this in SONiC. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>

…#18185) Why I did it deb11u1 is deprecated. Use deb11u2 instead. Other branches are not impacted, because their reproducible build version files are up to date. Work item tracking Microsoft ADO (number only): 26964185 How I did it How to verify it

Add following fixes to the SAI BRCM CSP # CS00012343052 - ESB_ECC_Ecc_2bErrInt seen in production BRCM CSP # CS00012333604 - J2C+ internal thermal sensor(s) spiking high spontaneously, seemingly in error

…cDBConfig Global config is already initialized (sonic-net#18609) * [portconfig]: Remove try exception during config_db initialization. (sonic-net#10960) Why I did it Provide fix for comment: https://github.com/Azure/sonic-buildimage/pull/10475/files#r847753187; Move laoding database config to application code instead of portconfig as portconfig is used as a library. How I did it Remove try exception handing from portconfig.py during config_db intialization. Move loading of database config to application that uses portconfig.py. How to verify it unit-test passes. Verified that it does not cause issue during boot up of multi-asic VS image. Verified that config_db generation was successful in multi-asic VS. * Fix code base on the review --------- Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>

…ng SONiC SAI native ASIC thermal sensor polling (sonic-net#18565) These changes ensure proper thermal subsystem operation when removing ASIC internal thermal sensor polling from NDK. Also includes kernel module changes for error level messages to warning level (this change is 202205 specific and not applicable to master) as well as fix for show chassis module with correct linecard description (Nokia-ION/ndk#45).

…D automatically (sonic-net#18731) #### Why I did it src/sonic-platform-daemons ``` * a566959 - (HEAD -> 202205, origin/202205) [chassis][linecard] Fix Module LINECARD<> went off-line message for empty slot issue (sonic-net#462) (sonic-net#470) (8 hours ago) [mssonicbld] ``` #### How I did it #### How to verify it #### Description for the changelog

…ig: Soni…" (sonic-net#18743) This reverts commit 87a5820.

Why I did it Buster is EOL, and the backports section has been removed from the main Debian repos. This also means that our default mirror (for non-snapshot builds) is also affected. Work item tracking Microsoft ADO (number only): 27691460 How I did it Change to using archive.debian.org directly for Buster. How to verify it

…-net#19102) Why I did it Migrate agent pool to fix S360 ticket. Work item tracking Microsoft ADO (number only): 27889786 How I did it How to verify it

) Syncd up with BRCM master branch. Microsoft ADO (28551006): SAI version - 7.1.80.4 [SAI_BRANCH rel_ocp_sai_7_1] [CSP CS00012340907] Backport JIRA Backport SONIC-89800 to rel_ocp_sai_7_1 Issue Summary: Seeing traffic drops on all priorities when PFC asserted on two priorities. Root Cause: 400G alpha settings incorrect Fix Description: EGQ alpha tuning SAI version - 7.1.79.4 [SAI_BRANCH rel_ocp_sai_7_1] [CSP CS00012340180] Backport JIRA SONIC-87935 to rel_ocp_sai_7_1 Issue Summary: Update the ACL impelementation to enable the ACL switch bind to support PFCWD on MACSEC devices --- DNX Root Cause: RFE Fix Description: RF (Info : Released before as part of 7.1.66.6) Updated SDK with latest fixes for IPS Queue delete interrupt not clearing SAI version - 7.1.78.4 (Info : Released before as part of 7.1.66.7) Update git submodules Update sdk-src/hsdk_6.5.24_SAI_7.1.0_GA from branch 'hsdk_6.5.24_SAI_7.1.0_GA' to 6d522a50666f59ecf5c0b062206696058d067204 [SAI_BRANCH rel_ocp_sai_7_1] Backport JIRA SDK-279322 to rel_ocp_sai_7_1 Issue Summary: EPNI_EtppEbdErrInt,ESB_ECC_Ecc_1bErrInt,ESB_ECC_Ecc_2bErrInt reported when inject packet with random packet size Root Cause: RFE Fix Description: EGQ stuck and EPNI/ESB interrupts may occur due to a wrong ETPP latency configuration. Such behavior may happen when a different stream of packets hit different databases in ETPP Termination databases (e.g. one flow runs with ESEM access and one flow without ESEM access in parallel). The issue is now fixed. Update git submodules Update sdk-src/hsdk_6.5.24_SAI_7.1.0_GA from branch 'hsdk_6.5.24_SAI_7.1.0_GA' to 183ee14b9930a0956e400b645d35137de0935d6f [SAI_BRANCH rel_ocp_sai_7_1] Backport JIRA SDK-272527 to rel_ocp_sai_7_1 Issue Summary: JR2 PVTMON temp readings has rare outlier values Root Cause: SW fix for PVT Fix Description: SW actions to reduce the outlier value by reading 5 times, and check if value difference > 5 oC, then reject the reading, and reading again. SAI version - 7.1.76.4 [CSP CS00012334537]backport SONIC-74283 to SAI7.1. Removed the restriction on FADT max on VSQ config SAI version - 7.1.75.4 [CSP CS00012330453]backport SONIC-85159_to_SAI7.1 add pp drop to Egress queue counter's drop reason, then egress queue counter increases for DNX ERPP Trap drop packet SAI version - 7.1.74.4 (Info : Released before as part of 7.1.66.6) [CSP CS00012332612] backport_SONIC-85922 to SAI7.1 revert EGQ2SCH internal flow control on JR2 [CSP CS00012316306] backport SONIC-80654 to SAI 7.1 fix SAI qos_prof allocaiton issue [CSP CS00012316306]SAI - backport SONIC-64658 to SAI7.1 After this fix below SAI debugcommands work SAI version - 7.1.73.4 [CSP CS00012322843] SAI_API_ROUTE:brcm_sai_xgs_route_create:115 iptnl info get failed with error -7 SAI version - 7.1.72.4 (Info : Released before as part of 7.1.66.6) BACKPORT SONIC-81858 PFCWD on IPv6 Issue Summary: PFCWD didn't drop IPv6 traffic in storm condition Root Cause: EPMF is not created correctly Fix Description: Add support for the ipv6 traffic SAI version - 7.1.71.4 SID: MMU cosq control configuration with Dynamic Type Check SAI version - 7.1.70.4 Backport SONIC-79944 to rel_ocp_sai_7_1 Issue Summary: Enable learning only if the port/lag is a bridge-port with ADMIN UP Root Cause: Learmimg was enabled in all cases Fix Description: Learning encabled only if the port/lag is a bridge-port with ADMIN UP SAI version - 7.1.69.4 ECMP LB traffic polarization, configure hash_offset along with hash_seed attr SAI version - 7.1.68.4 Update git submodules Update sdk-src/hsdk_6.5.24_SAI_7.1.0_GA from branch 'hsdk_6.5.24_SAI_7.1.0_GA' to 00c41616710e8254f89b0bdd8af869b046ea0899 [CSP CS00012316299][SAI_BRANCH rel_ocp_sai_7_1] L3 entry delete failed when SER error is present SAI version - 7.1.67.4 [CSP CS00012302165] Backport JIRA SONIC-77116 to rel_ocp_sai_7_1 Issue Summary: DNX ECMP hash fields clear before set Root Cause: Original hash field selection was not being cleared when user updated the selection. Fix Description: Field selection is cleared and only newly supplied selection is used. How to verify it Verified on sonic T2 chassis for basis tests.

….26.0.36 (sonic-net#19262) Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>

…tomatically (sonic-net#19317) src/sonic-linux-kernel * 58d3bf5 - (HEAD -> 202205, origin/202205) [ci] Migrate to sonicbld1es agent pool. (sonic-net#397) (4 weeks ago) [Liu Shilong]

…lly (sonic-net#18254) src/sonic-swss * dbe70dae - (HEAD -> 202205, origin/202205) Fix acl match ip_type_non_ipv4 and ip_type_non_ipv6. (sonic-net#2842) (4 months ago) [LTeng]

…ISTRY if it is not null (sonic-net#19466) (sonic-net#19565) Why I did it DEFAULT_CONTAINER_REGISTRY didn't work as expected in some scenario. How I did it When check for docker arch, use DEFAULT_CONTAINER_REGISTRY if it is not null. Co-authored-by: Liu Shilong <shilongliu@microsoft.com>

…atically (sonic-net#19224) src/sonic-utilities * 8e71e656 - (HEAD -> 202205, origin/202205) [build] Fix base OS compilation issue caused by incompatibility between urllib3 and requests packages (sonic-net#3328) (sonic-net#3355) (5 weeks ago) [Volodymyr Samotiy]

…D automatically (sonic-net#18744) src/sonic-platform-daemons * dee5310 - (HEAD -> 202205, origin/202205) [ci] Fix 202205 pipeline issue. (sonic-net#474) (3 months ago) [Liu Shilong]

sonic-net#13422) (sonic-net#13530)" (sonic-net#19616) This reverts commit 3860186.

Back port from master branch. Why I did it Back port graceful reboot instead of the sysfs power cycle to avoid filesystem corruption. How I did it Rename the platform_reboot script to the pre_reboot_hook. Remove the sysfs power cycle function, from now on the Debian reboot (/sbin/reboot) will be executed instead of the sysfs power cycle. How to verify it Start watching logs by using show log -f and journalctl -p debug -f. Execute the reboot command from the switch CLI. Check in logs that all systemd services terminated. Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>

mssonicbld · 2025-02-08T02:17:50Z

/azp run Azure.sonic-buildimage

azure-pipelines · 2025-02-08T02:17:56Z

Pull request contains merge conflicts.

mssonicbld and others added 30 commits July 13, 2023 08:27

[ci/build]: Upgrade SONiC package versions (sonic-net#15766)

7c6a161

Upgrade XGS SAI version to 7.1.54.4 (sonic-net#15820)

e73924e

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>

Pick dependency files in submodules. (sonic-net#15142) (sonic-net#15827)

9cd3319

[submodule] Update submodule sonic-swss to the latest HEAD automatica…

4a248c5

…lly (sonic-net#15834) src/sonic-swss * 493f66f5 - (HEAD -> 202205, origin/202205) Remove redundant updateFabricPortState (sonic-net#2850) (6 hours ago) [kenneth-arista]

[submodule] Update submodule sonic-sairedis to the latest HEAD automa…

a3b0e7f

…tically (sonic-net#15833) src/sonic-sairedis * 56aee6c - (HEAD -> 202205, origin/202205) [202205] Advance SAI submodule to fix issue sonic-net#14706 (sonic-net#1265) (10 hours ago) [abdosi]

Potential fix for Celestica E1031 device hang (sonic-net#15822) (soni…

f02ca9a

…c-net#15844)

update rsyslog log size conf (sonic-net#15821) (sonic-net#15845)

1b32bf6

cherry-pick frr log enhancement for graceful restart (sonic-net#15863)

1243a6e

Graceful restart is a key event for bgpd, related log print is debug level. To change it to info level to get more visibilities when this kind of event is triggered.

[ci/build]: Upgrade SONiC package versions (sonic-net#15855)

0291dae

[submodule] Update submodule sonic-swss to the latest HEAD automatica…

f55d574

…lly (sonic-net#15887) src/sonic-swss * ed0ca898 - (HEAD -> 202205, origin/202205) Add missing parameter to on_switch_shutdown_request method. (sonic-net#2567) (34 hours ago) [Hua Liu]

[submodule update] sonic-sairedis (sonic-net#15831)

b4f86cd

56aee6c0dfc4705584764edd35b7b2977bb762eb (HEAD -> 202205, origin/202205) [202205] Advance SAI submodule to fix issue sonic-net#14706 (sonic-net#1265) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

Update WRED profile on system ports (sonic-net#15612) (sonic-net#15914)

ab0768e

* Update WRED profile on system ports Co-authored-by: vmittal-msft <46945843+vmittal-msft@users.noreply.github.com>

[submodule] Update submodule sonic-swss to the latest HEAD automatica…

54b21c1

…lly (sonic-net#15928) src/sonic-swss * 17c4d731 - (HEAD -> 202205, origin/202205) Remove system neighbor DEL operation in m_toSync if SET operation for (sonic-net#2853) (3 hours ago) [Song Yuan]

[ci/build]: Upgrade SONiC package versions (sonic-net#15939)

a03489a

[202205] [Mellanox] Advance SAI submodule pointer (sonic-net#15970)

5475f85

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>

vmittal-msft and others added 23 commits March 17, 2024 22:20

Updated DNX chip bcmsai version to 7.1.66.6 (sonic-net#18379)

0303be4

Updated DNX chip bcmsai version 7.1.66.7 (sonic-net#18671)

bdc1675

Add following fixes to the SAI BRCM CSP # CS00012343052 - ESB_ECC_Ecc_2bErrInt seen in production BRCM CSP # CS00012333604 - J2C+ internal thermal sensor(s) spiking high spontaneously, seemingly in error

Revert "[multi-asic] Fixed 13137 ERR python3: :- initializeGlobalConf…

b48a2b6

…ig: Soni…" (sonic-net#18743) This reverts commit 87a5820.

[ci] Migrate ubuntu and sonicbld agent pool to fix S360 alert. (sonic…

442fe3e

…-net#19102) Why I did it Migrate agent pool to fix S360 ticket. Work item tracking Microsoft ADO (number only): 27889786 How I did it How to verify it

[202205] [Mellanox] Update SDK/FW to 4.5.4520/2010.4518 & SAI to 2205…

f471848

….26.0.36 (sonic-net#19262) Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>

[202205][Arista] Update arista platform submodules (sonic-net#19338)

7cc9c39

[submodule] Update submodule sonic-linux-kernel to the latest HEAD au…

f61cf80

…tomatically (sonic-net#19317) src/sonic-linux-kernel * 58d3bf5 - (HEAD -> 202205, origin/202205) [ci] Migrate to sonicbld1es agent pool. (sonic-net#397) (4 weeks ago) [Liu Shilong]

[submodule] Update submodule sonic-swss to the latest HEAD automatica…

171704c

…lly (sonic-net#18254) src/sonic-swss * dbe70dae - (HEAD -> 202205, origin/202205) Fix acl match ip_type_non_ipv4 and ip_type_non_ipv6. (sonic-net#2842) (4 months ago) [LTeng]

[submodule] Update submodule sonic-platform-daemons to the latest HEA…

2ceacd4

…D automatically (sonic-net#18744) src/sonic-platform-daemons * dee5310 - (HEAD -> 202205, origin/202205) [ci] Fix 202205 pipeline issue. (sonic-net#474) (3 months ago) [Liu Shilong]

[ci/build]: Upgrade SONiC package versions (sonic-net#17918)

9556367

Revert "[sudoers] add /usr/local/bin/storyteller to READ_ONLY_CMDS (

3d8f196

sonic-net#13422) (sonic-net#13530)" (sonic-net#19616) This reverts commit 3860186.

Added code to process-reboot-check that mitigates JSON decoder errors

acaa8a1

jianyuewu requested review from StormLiangMS, qiluo-msft, xumia and lguohan as code owners February 8, 2025 02:17

jianyuewu closed this Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[202205] Backport: Rename the platform_reboot to the pre_reboot_hook, remove the sysfs power cycle #21678

[202205] Backport: Rename the platform_reboot to the pre_reboot_hook, remove the sysfs power cycle #21678

jianyuewu commented Feb 8, 2025

mssonicbld commented Feb 8, 2025

azure-pipelines bot commented Feb 8, 2025

[202205] Backport: Rename the platform_reboot to the pre_reboot_hook, remove the sysfs power cycle #21678

[202205] Backport: Rename the platform_reboot to the pre_reboot_hook, remove the sysfs power cycle #21678

Conversation

jianyuewu commented Feb 8, 2025

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

mssonicbld commented Feb 8, 2025

azure-pipelines bot commented Feb 8, 2025