-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic segfaults #100
Comments
Hi @skyeagle, Thank you for opening this issue. It really annoys me that there's a bug that I have no idea where to hunt for. Than again, sometimes it seems like my job is writing tests and searching for bugs more than writing code. If you have any ideas how to reproduce, that would be great - since I can't test for a bug or a patch unless I can reproduce the issue. Also:
The thing that really gets to me is that the whole container crashes. Assuming iodine runs in cluster mode, any child that crashes should be respawned. If the master process crashes, you would probably get more log messages than you do (child processes will complain before shutting down). Besides, the master process is fairly lazy. Except for pub/sub and Redis connectivity all it does is monitor the worker processes for failures (crash). Could you run iodine with debug level verbosity (it's a lot of junk output, I know, sorry about that)? To get debugging messages from iodine, run it with Waiting for more information. Thank you again for opening this issue! |
@boazsegev thanks for your feedback and patience. Answering your questions...
Yes, it's compiled with
I'm not aware of any.
Yes. It's been running with 1-N workers. Is that considered cluster mode? I have not seen this issue today but made some changes per your suggestion for better debugging in production for next occurence. I'll report back as soon I get something new. Thanks again for great work on this server! |
Cluster mode is activated when iodine runs more than a single process, in which case it automatically monitors child processes (workers) for failure and re-spawns them. Usually this is achieved either automatically (on systems with more than 2 CPU cores) or by using the iodine -w -1 -t 8 I'll wait for more information. |
Hi, Please find attached a segfault log I've just got in one of my servers. I hope could be useful. System Information
Iodine start up info:
|
Hi, Maybe could be helpfull the C backtrace with source line numbers:
Full log is downloadable from the link in my previous comment. Thanks. |
I went over the logs but they didn't provide enough data, especially considering I cannot seem to reproduce the issue. Are you using iodine's TLS/SSL layer? I know there's some possible OpenSSL memory management issues (might be iodine related or OpenSSL bugs, not too sure). |
Hi Bo, Thanks for your reply. Yes, my Iodine setup include -- fabio |
Hi Fabio, If you're not actively using TLS/SSL (i.e., using -- Bo |
Hi, I've enabled Iodine full log ( -- fabio |
Thanks Fabio. Do I understand correctly that this occurs on only one of the production servers? Do you use iodine on other production servers? If so, what's the difference? What features of iodine are you using on the server that sporadically fails that you might not be using on other servers? What other C extensions are you using on that server (it might be a memory corruption error from another extension that causes this)? -- Bo. |
I use Iodine on 6 servers with same configuration of system and application. All servers have sporadic segfaults, so we have no helpful info from server comparison. I've enabled verbose log only on one server for now as a precaution, just to understand if the verbosity could cause other issues (loss of performance, disk space saturation, etc.). If I see no problem, I'll enable full log in all servers. -- fabio |
Hi, Pleas find attached the log of the core dump with debug level verbosity ( I intentionally left in the file a lot of lines before the crash to give you more context. The core dump is at line 7332. I really hope this could help to narrow down the issue. -- fabio |
I'm looking into this... The only thing I can think of is that the fallback IO thread (the one that will make sure to send any pending data even when the Ruby threads are all busy) is somehow attempting to access an invalid pointer in an empty (new) connection. |
System Information
Description
We are getting sporadic segfaults from time to time.
Actual behavior
There is no way we can reproduce this since it happens out of the blue on production and the entire container dies without any additional debug information other than following backtrace:
It doesn't happen pretty often but still a bit concerning moving forward.
Any help would be appreciated! Thank you!
The text was updated successfully, but these errors were encountered: