Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[smart_holder] test_class_sh_mi_thunks (started from PR #4374) #4380

Merged
merged 15 commits into from
Dec 13, 2022

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Dec 2, 2022

Description

This PR fixes failures for the tests originally added under PR #4374, for shared_ptr and unique_ptr. The fix for the unique_ptr case is not great, it conflicts with proper handling of shared_from_this (see PR #3023). That is probably a very rare edge case though, resulting in a bad_weak_ptr exception in the worst case.

It would not be a surprise if there are more oversights of the kind fixed in this PR. The use of C++ multiple inheritance is highly discouraged at Google, and the tool the smart_holder was developed for (http://github.com/google/clif) does not support MI at all, therefore the interaction of MI with other features is not nearly as well exercised as other things.

It would be ideal to add more testing (and possibly fixes) for interactions with other features (disowning/re-owning, factory functions (init), trampolines, shared_from_this), but we will not be able to prioritize that anytime soon. If you run into issues, please send PRs with reproducers, similar to PR #4374, we will try our best to fix.

Suggested changelog entry:

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2022

@bluescarni, @Skylion007

I think there is some serious bug in the core MI handling code that manifests itself with class_ only if passing shared_ptr, but with classh (aka class_<T, smart_holder>) always (unsurprisingly actually, the way the smart_holder_type_casters are implemented: it should either work the same for all kinds of pointers, or not work the same).

See the OK, FAIL comments in test_mi_debug.cpp.

I don't have a clue what could be wrong, thus far I used the core MI handling code only as a black box, more or less.

A starting point for debugging would be to inspect the wrapped object created when make_derived_as_base0 returns.

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2022

It turns out this is a problem in the user code:

-        auto ret = std::shared_ptr<Base0>(ret_der);
+        auto ret = std::dynamic_pointer_cast<Base0>(ret_der);

std::shared_ptr isn't as smart about casting as all of us were hoping/assuming.
EDIT: Maybe it is, I don't know.

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2022

It turns out only my print statements looked good, but the code is still failing/succeeding the same way as before.

Needs more work.

@bluescarni
Copy link
Contributor

bluescarni commented Dec 2, 2022

It turns out this is a problem in the user code:

-        auto ret = std::shared_ptr<Base0>(ret_der);
+        auto ret = std::dynamic_pointer_cast<Base0>(ret_der);

std::shared_ptr isn't as smart about casting as all of us were hoping/assuming.

Not sure I understand, aren't those lines equivalent?

@bluescarni
Copy link
Contributor

It turns out this is a problem in the user code:

-        auto ret = std::shared_ptr<Base0>(ret_der);
+        auto ret = std::dynamic_pointer_cast<Base0>(ret_der);

std::shared_ptr isn't as smart about casting as all of us were hoping/assuming.

Not sure I understand, aren't those lines equivalent?

Ok so they don't look completely equivalent, but they should become identical if you replace dynamic_pointer_cast (which is unnecessary in this case) with static_pointer_cast.

In both cases (dynamic vs static cast), the operation is safe and well defined. Do you agree @rwgk ?

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2022

In both cases (dynamic vs static cast), the operation is safe and well defined. Do you agree @rwgk ?

I backtracked, and now think I somehow missed a step before in my experiments (making two changes instead of one or similar): the behavior difference that I attributed to shared_ptr vs dynamic_pointer_cast I can only reproduce by switching between USE_SH defined or not. The shared_ptr/dynamic_pointer_cast change does not make a difference.

Concretely, with this diff

     m.def("make_derived_as_base0", []() -> std::shared_ptr<Base0> {
         auto ret_der = std::make_shared<Derived>();
+        printf("\nLOOOK [%lu] %s:%d\n", (std::size_t) ret_der.get(), __FILE__, __LINE__); fflush(stdout);
         auto ret = std::dynamic_pointer_cast<Base0>(ret_der);
+        printf("\nLOOOK [%lu] %s:%d\n", (std::size_t) ret.get(), __FILE__, __LINE__); fflush(stdout);
         return ret;
     });

@@ -66,7 +68,10 @@ TEST_SUBMODULE(mi_debug, m) {

     // class_ OK
     // classh FAIL
-    m.def("get_vec_size_raw_ptr_derived", [](const Derived *obj) { return obj->vec.size(); });
+    m.def("get_vec_size_raw_ptr_derived", [](const Derived *obj) {
+        printf("\nLOOOK [%lu] %s:%d\n", (std::size_t) obj, __FILE__, __LINE__); fflush(stdout);
+        return obj->vec.size();
+    });

and USE_SH not defined I see this when running test_get_vec_size_raw_ptr_derived:

test_mi_debug.py::test_get_vec_size_raw_ptr_derived
LOOOK [94030593024464] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:53

LOOOK [94030593024496] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:55

LOOOK [94030593024464] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:72
PASSED

With USE_SH defined this:

test_mi_debug.py::test_get_vec_size_raw_ptr_derived
LOOOK [93905502302672] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:53

LOOOK [93905502302704] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:55

LOOOK [93905502302704] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:72
FAILED

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 2, 2022

Please take a look at commit 248ac89.

With that all 3 test_mi_debug.py tests pass, with USE_SH not defined or defined.

That makes me think there is a problem in processing the shared_ptr<Base0> return.

But, with this additional diff

 def test_get_vec_size_raw_ptr_derived():
+    obj = m.make_derived_as_base0()
+    print("\nLOOOK", obj)
     obj = m.make_derived_as_base0_raw_ptr()
+    print("\nLOOOK", obj)
     assert m.get_vec_size_raw_ptr_derived(obj) == 5

there is this output:

test_mi_debug.py::test_get_vec_size_raw_ptr_derived
LOOOK [94594958013248] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:59

LOOOK [94594958013280] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:62

LOOOK <pybind11_tests.mi_debug.Derived object at 0x7f531044a970>

LOOOK [94594958047472] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:71

LOOOK [94594958047504] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:74

LOOOK <pybind11_tests.mi_debug.Derived object at 0x7f531044a930>

LOOOK [94594958047472] /usr/local/google/home/rwgk/forked/pybind11/tests/test_mi_debug.cpp:93
PASSED

I.e. pybind11 figured out that the returned shared_ptr<Base0> is actually a Derived, but apparently it is storing the Base0 pointer anyway.

@bluescarni
Copy link
Contributor

I.e. pybind11 figured out that the returned shared_ptr<Base0> is actually a Derived, but apparently it is storing the Base0 pointer anyway.

I don't think it is an issue that pybind11 is storing the shared_ptr<Base> as return value of make_derived_as_base0().

The problem is that the conversion that happens later, "transforming" the shared_ptr<Base> into the shared_ptr<Derived> argument to get_vec_size_shared_ptr_derived(), is NOT done with the correct tool (i.e., std::dynamic_pointer_cast), but via some other unsafe casting operation.

Are you positive that in the smart holder branch all instances of unsafe casts (including reinterpret_cast but also raw C-style casts) have been removed?

@bluescarni
Copy link
Contributor

I won't have any time this weekend, but next week I should be able to re-run the debugger in the smart holder branch and try to follow the code there. @rwgk

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 3, 2022

Are you positive that in the smart holder branch all instances of unsafe casts (including reinterpret_cast but also raw C-style casts) have been removed?

Not sure anymore. I was thinking yes, but evidently not.

The reinterpret_cast you found is still used, but for py::smart_holder it's well defined AFAICT, because py::smart_holder is not templated.

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 3, 2022

@bluescarni I think for the instance-from-shared_ptr case I found the bug & fix, see 25c3dc9:

-        auto smhldr = pybindit::memory::smart_holder::from_shared_ptr(src);
+        auto smhldr = pybindit::memory::smart_holder::from_shared_ptr(std::shared_ptr<void>(src, const_cast<void *>(st.first)));

In hindsight a blatant oversight.

There could be a similar problem for the instance-from-unique_ptr case, I need to look more. That will probably be more tricky to solve, because the unique_ptr deleter is a templated type rather than a dynamic type as for shared_ptr (I'm not sure my choice of terms is precise, but hopefully enough to give the idea).

I think I will want to test a full matrix of (from_raw_ptr, from_unique_ptr, from_shared_ptr) x (as_raw_ptr, as_unique_ptr, as_shared_ptr) and (from_base0, from_derived) x (as_base0, as_derived) combinations.

I don't think I'll have the spare time to backport the fix(es) to pybind11 master.

rwgk added 8 commits December 9, 2022 11:25
```diff
-        auto smhldr = pybindit::memory::smart_holder::from_shared_ptr(src);
+        auto smhldr = pybindit::memory::smart_holder::from_shared_ptr(std::shared_ptr<void>(src, const_cast<void *>(st.first)));
```
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 10, 2022

@bluescarni The latest version of this PR passes testing with all clang sanitizers (Google toolchain). The 2 CI failures here are flakes I've seen before. EDIT: The CI had no failures after reruns. I initiated Google global testing which will take 6+ hours.

The fix for the unique_ptr case isn't great, it conflicts with proper handling of shared_from_this (see PR #3023). That's probably a very rare edge case though, resulting in a bad_weak_ptr exception in the worst case.

I wouldn't be surprised if there are more oversights of the kind fixed in this PR. The use of C++ multiple inheritance is highly discouraged at Google, and the tool I need the smart_holder for (http://github.com/google/clif) does not support MI at all, therefore the interaction of MI with other features is not nearly as well exercised as other things. When I was done with my initial iteration of the smart_holder implementation, I tried to guess things that could go wrong in test_class_sh_disowning_mi.py, but your test case made it obviously that my attempts fell flat. If you have more tests, please let me know. I'll try to fix what I can.

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 10, 2022

This PR also passed Google-internal global testing (internal testing ID OCL:494341648:BASE:494408109:1670696135236:b43772d2).

I think it would be ideal to add more testing (and possibly fixes) for interactions with other features (disowning/re-owning, factory functions (init), trampolines, shared_from_this) but I will not be able to prioritize that anytime soon. I'll rename the new test, then mail it for review asap.

Anyone looking here: please don't hesitate to comment.

@rwgk rwgk changed the title [smart_holder] test_mi_debug (started from PR #4374) [smart_holder] test_class_sh_mi_thunks (started from PR #4374) Dec 10, 2022
@rwgk
Copy link
Collaborator Author

rwgk commented Dec 13, 2022

Thanks for the reviews!

@rwgk rwgk marked this pull request as ready for review December 13, 2022 03:55
@rwgk rwgk requested a review from henryiii as a code owner December 13, 2022 03:55
@rwgk rwgk merged commit 4766065 into pybind:smart_holder Dec 13, 2022
@rwgk rwgk deleted the sh_mi_debug branch December 13, 2022 03:55
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Dec 13, 2022
@rwgk rwgk removed the needs changelog Possibly needs a changelog entry label Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants