Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pthread_exit crashes on threads created by std::thread::spawn in 1.84, not 1.83, breaking pyo3-log #135929

Open
arielb1 opened this issue Jan 23, 2025 · 9 comments
Labels
A-thread Area: `std::thread` C-bug Category: This is a bug. S-has-bisection Status: a bisection has been found for this issue S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@arielb1
Copy link
Contributor

arielb1 commented Jan 23, 2025

Meta

Tested on Ubuntu 24.04 and Amazon Linux 2, x86_64.

The Production Problem

The crate pyo3-log installs a bridge that makes log functions call into Python. This means that all calls to logging::info! etc will take the GIL.

Python has a stage during interpreter shutdown where attempts to take the GIL will cause a pthread_exit. Python 3.14 (still Alpha today, targeted to be released by the end of this year) will change this in python/cpython#87135 - but that will take some time to reach people.

This means that if you have a Python program that uses a Rust library and pyo3-log, that spawning a Rust thread, that is calling logging::info! in a way unsynchronized with interpreter exit, you'll have unpredicatable crashes in 1.84.

Minified Program

This program:

use std::ffi::c_void;

extern "C" {
    fn pthread_exit(retval: *const c_void);
}
fn main() {
    std::thread::spawn(|| {
        unsafe { pthread_exit(std::ptr::null()); }
    });
    std::thread::sleep(std::time::Duration::from_secs(1));
}

when compiled with the following options

rustc +1.84 d.rs -Cpanic=abort -Cdebuginfo=limited

crashes with this confusing error

thread '<unnamed>' panicked at core/src/panicking.rs:223:5:
panic in a function that cannot unwind
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_nounwind_fmt::runtime
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:119:22
   2: core::panicking::panic_nounwind_fmt
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/intrinsics/mod.rs:3535:9
   3: core::panicking::panic_nounwind
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:223:5
   4: core::panicking::panic_cannot_unwind
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:315:5
   5: std::sys::pal::unix::thread::Thread::new::thread_start
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/sys/pal/unix/thread.rs:99:9
   6: start_thread
   7: clone
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread caused non-unwinding panic. aborting.
Aborted

This crash happens:

  1. Only on 1.84, not on 1.83
  2. Only when debuginfo is enabled, but even if the binary is stripped.

When using -C panic=unwind instead, on all versions of the compiler, you get this error:

FATAL: exception not rethrown
Aborted (core dumped)

I seen the claim in Zulip (https://rust-lang.zulipchat.com/#narrow/channel/122651-general/topic/pthread_exit.20from.20a.20Rust-spawned.20thread) that this is undefined behavior, but I'll rather not break pyo3-log

@arielb1 arielb1 added the C-bug Category: This is a bug. label Jan 23, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 23, 2025
@jieyouxu jieyouxu added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jan 23, 2025
@arielb1
Copy link
Contributor Author

arielb1 commented Jan 23, 2025

See #129582 and #74990

@jieyouxu jieyouxu added A-thread Area: `std::thread` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 23, 2025
@jieyouxu
Copy link
Member

Relevant discussions: rust-lang/unsafe-code-guidelines#404

@arielb1
Copy link
Contributor Author

arielb1 commented Jan 23, 2025

Even if it's bad, I believe we need to have a working way to write Python libraries in Rust, even if the libraries spawn Rust threads that can potentially call into Python. debuginfo = 0 is a working way, but it feels too ugly.

The other argument is that Rust 1.84 has the MSRV-aware resolver, so e.g. reverting the change just for 1.84, leaving people stuck in Rust 1.84, is much better than leaving them stuck in 1.83.

I believe it would be possible to make pyo3-log marshal everything to a Python thread, but that would not work if there are non-pyo3-log causes of the problem.

@arielb1
Copy link
Contributor Author

arielb1 commented Jan 23, 2025

cc @vorner for the pyo3-log problem

@Noratrieb Noratrieb added E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 23, 2025
@arielb1
Copy link
Contributor Author

arielb1 commented Jan 23, 2025

As expected:
Bad/regressed (new) commit: 06d261d Auto merge of #129582 - nbdd0121:unwind, r=nnethercote
Good (old) commit: dd51276 Auto merge of #131796 - cuviper:no-waker-waker, r=Mark-Simulacrum

@jieyouxu jieyouxu added S-has-bisection Status: a bisection has been found for this issue and removed E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc labels Jan 23, 2025
@arielb1
Copy link
Contributor Author

arielb1 commented Jan 23, 2025

A solution is also possible in pyo3:

Create C++ code that does

PyGILState_STATE safe_gil_ensure() {
    try {
        return PyGILState_Ensure();
    } catch(...) {
        for(;;) pause();
    }
}

and replace the calls to PyGILState_Ensure in pyo3 with it. It's basically the Python 3.14 solution but implemented within pyo3

@nikomatsakis
Copy link
Contributor

cc @davidhewitt to discuss best resolution here

@davidhewitt
Copy link
Contributor

See also vorner/pyo3-log#30 and PyO3/pyo3#2102 (@arielb1 I see you have already commented on the pyo3-log issue).

It seems to me that getting a fix into PyO3 is the most reasonable approach.

The proposed C++ snippet... seems fine, but is also somewhat heavy. I would imagine that requires adding cc as a build dependency of pyo3-ffi, and requiring a working C++ toolchain for users compiling PyO3 libraries. Not the end of the world of course.

I don't see exactly which change in 1.84 is responsible for the regression, is it well understood? I also think this has always been UB, and 1.84 didn't change that?

See also PyO3/pyo3#3386 - really what I think needs to happen is there needs to be a way to attempt to acquire the GIL which can fail gracefully. pyo3-log should then switch to using that. This probably needs upstream CPython API but has been out of my capacity to propose recently.

@arielb1
Copy link
Contributor Author

arielb1 commented Jan 23, 2025

I don't see exactly which change in 1.84 is responsible for the regression, is it well understood? I also think this has always been UB, and 1.84 didn't change that?

As far as I can tell it has always been UB. The change in #129582 had made it crash with panic=abort, before that it also crashed with panic=unwind.

As far as I can tell Python has no non-racy way of fallibly acquiring the GIL. I think that doing the Python 3.14 thing is good enough.

The proposed C++ snippet... seems fine, but is also somewhat heavy. I would imagine that requires adding cc as a build dependency of pyo3-ffi, and requiring a working C++ toolchain for users compiling PyO3 libraries. Not the end of the world of course.

Sure. But I can't think of an easier non-racy way of doing things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-thread Area: `std::thread` C-bug Category: This is a bug. S-has-bisection Status: a bisection has been found for this issue S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants