-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garnet crash on load data #766
Comments
Garnet 1.0.37, 1.0.39:
Same error on EPYC 9634 (Zen4), Debian 11 (ldd (Debian GLIBC 2.31-13+deb11u8) 2.31) kernel 5.15.143-1-pve, zfs:
But no error on Ryzen 9 7900X (Zen4), ArchLinux (ldd (GNU libc) 2.40) kernel 6.11.5-arch1-1, ext4 on LUKS, same run under podman (conman) but rootless:
But
|
677 minutes - performed scan... This is uber slow! |
can you share exactly what data to load into redis before dumping to Resp file, as the repro might depend of the number of keys, size of values etc. that are being dumped. |
Also, what is the result of was a save/bgsave invoked either by client or the dump loading tool? try re-running with the error indicates that a read was conducted, why would loading a dump cause a read is not clear. any idea on what the actual operations were being done? why would there have been a read being performed. |
try loading the same data directly using Resp.benchmark and see if scan is still slow. it is hard to say what might be the cause of this slowdown with the given information. is the hash table too small, leading to lots of collisions, for example.
Make sure you did not try to store more than this number of distinct objects. |
Unfortunately, I cannot provide the data itself - it is private. But here is the result of the command
Also I tried to load different datasets - 61M, 30M and 3M - the failure occurs when 2M-2.5M are processed.
No. Saving was not performed, scan was performed only on data in memory
No special readings were performed during loading. I will perform other tests on the next week |
Data was load (on EPYC server)
with next params
But scan still slow on EPYC with 336 cores (same on Ryzen 9 7900x with 16 cores): For compare same data scan on keydb:
|
1.0.44 - have same slow scan:
And scan speed is down when already scanned ~440-443 MiB (on different Garnet versions)
I didn't understand how to load data with resp.benchmark? |
Is the issue with "garnet crash" still there or is that no longer happening? Scan is a separate issue - could you please provide a repro including data (generated data is fine since you cannot share the real data) for us to diagnose that further. |
@badrishc with the above configuration, garnet no longer crashes. Ok I create separate issue about slow scan |
@rnz is there still a load-related issue here that should be looked into? |
@TalZaccai |
In version 1.0.51 get more informative message on fail load data when enabled AOF:
And now Garnet not crash:
I was try different AOF settings:
...
|
Do you get the error even after setting the AOF page size to be sufficiently large? The error with small AOF page size is expected because the key value pair needs to fit on one AOF page. We cannot break up records to span multiple AOF pages. If there is still a problem even after setting the AOF page size to be larger than inserted records, then there might be an issue. In that case, we would also need some repro logic and synthetic data that causes the crash on loading, so that we can diagnose this further. |
Yes.
I'm trying to determine the key that causes this error. It would be easier if the error message would display the key that the error occurred on. |
Describe the bug
Here
redis-cli --pipe < db0.resp
Steps to reproduce the bug
redis-cli --pipe < db0.resp
Expected behavior
Full loaded data from resp-file by redis-cli to garnet
Screenshots
No response
Release version
v1.0.35
IDE
No response
OS version
debian 11
Additional context
FYI: dragonfly is load same resp file (in same environment).
The text was updated successfully, but these errors were encountered: