Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race when using shared variables (free threading) #5494

Merged
merged 9 commits into from
Jan 16, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions include/pybind11/detail/class.h
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,31 @@ inline void traverse_offset_bases(void *valueptr,
}
}

#ifdef Py_GIL_DISABLED
static inline void enable_try_inc_ref(PyObject *op) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChatGPT:

The static in this context is redundant because the inline specifier already implies internal linkage for the function. Let's break it down:

static in C++ for functions: When applied to a function, static gives the function internal linkage, meaning it is only visible within the translation unit where it is defined.

inline in C++: Functions defined as inline have the following implications:

They can be defined in a header file and included in multiple translation units without violating the One Definition Rule (ODR).
They also implicitly have internal linkage unless explicitly declared with extern.
When inline is used, the function already has internal linkage by default. Adding static is not harmful, but it serves no additional purpose.

Suggested Best Practice
To avoid confusion and redundant code, it is generally better to omit static when inline is used, unless there's a specific stylistic or historical reason to keep it.

// TODO: Replace with PyUnstable_Object_EnableTryIncRef when available.
// See https://github.com/python/cpython/issues/128844
if (_Py_IsImmortal(op)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to rename op to obj? (Or are there any special requirements for op that you want to reflect with the variable name?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special reason. op is just commonly used as a variable name for Python objects in CPython so that's what I'm used to now.

return;
}
for (;;) {
Py_ssize_t shared = _Py_atomic_load_ssize_relaxed(&op->ob_ref_shared);
if ((shared & _Py_REF_SHARED_FLAG_MASK) != 0) {
// Nothing to do if it's in WEAKREFS, QUEUED, or MERGED states.
return;
}
if (_Py_atomic_compare_exchange_ssize(
&op->ob_ref_shared, &shared, shared | _Py_REF_MAYBE_WEAKREF)) {
return;
}
}
}
#endif

inline bool register_instance_impl(void *ptr, instance *self) {
#ifdef Py_GIL_DISABLED
enable_try_inc_ref((PyObject *) self);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use

reinterpret_cast<PyObject *>(self)

?
(for readability; we still have a lot of raw C casts, but C++-style casts are preferred in new or changed code)

#endif
with_instance_map(ptr, [&](instance_map &instances) { instances.emplace(ptr, self); });
return true; // unused, but gives the same signature as the deregister func
}
Expand Down
48 changes: 47 additions & 1 deletion include/pybind11/detail/type_caster_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,49 @@ PYBIND11_NOINLINE handle get_type_handle(const std::type_info &tp, bool throw_if
return handle(type_info ? ((PyObject *) type_info->type) : nullptr);
}

inline bool try_incref(PyObject *obj) {
// Tries to increment the reference count of an object if it's not zero.
// TODO: Use PyUnstable_TryIncref when available.
// See https://github.com/python/cpython/issues/128844
#ifdef Py_GIL_DISABLED
// See
// https://github.com/python/cpython/blob/d05140f9f77d7dfc753dd1e5ac3a5962aaa03eff/Include/internal/pycore_object.h#L761
uint32_t local = _Py_atomic_load_uint32_relaxed(&obj->ob_ref_local);
local += 1;
if (local == 0) {
// immortal
return true;
}
if (_Py_IsOwnedByCurrentThread(obj)) {
_Py_atomic_store_uint32_relaxed(&obj->ob_ref_local, local);
# ifdef Py_REF_DEBUG
_Py_INCREF_IncRefTotal();
# endif
return true;
}
Py_ssize_t shared = _Py_atomic_load_ssize_relaxed(&obj->ob_ref_shared);
for (;;) {
// If the shared refcount is zero and the object is either merged
// or may not have weak references, then we cannot incref it.
if (shared == 0 || shared == _Py_REF_MERGED) {
return false;
}

if (_Py_atomic_compare_exchange_ssize(
&obj->ob_ref_shared, &shared, shared + (1 << _Py_REF_SHARED_SHIFT))) {
# ifdef Py_REF_DEBUG
_Py_INCREF_IncRefTotal();
# endif
return true;
}
}
#else
assert(Py_REFCNT(obj) > 0);
Py_INCREF(obj);
return true;
#endif
}

// Searches the inheritance graph for a registered Python instance, using all_type_info().
PYBIND11_NOINLINE handle find_registered_python_instance(void *src,
const detail::type_info *tinfo) {
Expand All @@ -249,7 +292,10 @@ PYBIND11_NOINLINE handle find_registered_python_instance(void *src,
for (auto it_i = it_instances.first; it_i != it_instances.second; ++it_i) {
for (auto *instance_type : detail::all_type_info(Py_TYPE(it_i->second))) {
if (instance_type && same_type(*instance_type->cpptype, *tinfo->cpptype)) {
return handle((PyObject *) it_i->second).inc_ref();
PyObject *wrapper = (PyObject *) it_i->second;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

auto *wrapper = reinterpret_cast<PyObject *>(it_i->second);

will make clang-tidy happy (but I haven't tried it out myself).

if (try_incref(wrapper)) {
return handle(wrapper);
}
}
}
}
Expand Down
6 changes: 6 additions & 0 deletions tests/test_thread.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ struct IntStruct {
int value;
};

struct EmptyStruct {};
static EmptyStruct SharedInstance;

} // namespace

TEST_SUBMODULE(thread, m) {
Expand Down Expand Up @@ -61,6 +64,9 @@ TEST_SUBMODULE(thread, m) {
},
py::call_guard<py::gil_scoped_release>());

py::class_<EmptyStruct>(m, "EmptyStruct")
.def_readonly_static("SharedInstance", &SharedInstance);

// NOTE: std::string_view also uses loader_life_support to ensure that
// the string contents remain alive, but that's a C++ 17 feature.
}
20 changes: 20 additions & 0 deletions tests/test_thread.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,23 @@ def test_implicit_conversion_no_gil():
x.start()
for x in [c, b, a]:
x.join()


@pytest.mark.skipif(sys.platform.startswith("emscripten"), reason="Requires threads")
def test_bind_shared_instance():
nb_threads = 4
b = threading.Barrier(nb_threads)

def access_shared_instance():
b.wait()
for _ in range(1000):
x = m.EmptyStruct.SharedInstance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the explicit del here needed? Could this be simplified to

        for _ in range(1000):
            m.EmptyStruct.SharedInstance

?

I asked ChatGPT and it seems to think the simpler code is equivalent. If that's not correct, could you please add a comment to explain (super terse would be fine)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's effectively the same

Copy link
Contributor Author

@colesbury colesbury Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... clang-tidy ruff doesn't like the "useless expression":


tests/test_thread.py:60:13: B018 Found useless expression. Either assign it to a variable or remove it.
   |
58 |         b.wait()
59 |         for _ in range(1000):
60 |             m.EmptyStruct.SharedInstance
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ B018
61 | 
62 |     threads = [
   |

Found 1 error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is:

            m.EmptyStruct.SharedInstance  # noqa: B018

That would be my preference, but only very slightly so. With your comment it's also immediately obvious that there is nothing special about the del.

Please let me know if you prefer to keep this as is. I'll merge this PR when I see that the CI is green.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it to use the # noqa

del x

threads = [
threading.Thread(target=access_shared_instance) for _ in range(nb_threads)
]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
Loading