-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluggable output backends #2943
Comments
Hey 👋 I support this proposal for several reasons. The first comes from a "human-cost" perspective and its based around the idea that we should join efforts so we can have more eyes in fewer tools instead of having more tools being maintained in isolation. The second one is that even though this sounds like a pretty big change as a whole, most of the bold things stated here are already done by bindgen in one way or another:
The general idea of having gated outputs sounds reasonable for me, specially since it doesn't affect the default behavior of bindgen and it won't break other people's code. Relying on My only remaining question here is how do we start? Would anyone be willing to submit some patches with this? or do you consider we should have some discussions before to iron out some kinks? |
Yeah, not opposed to something like this, but FWIW you can already do stuff like "replace E.g., for a header like: // t.hpp
#include <string>
struct Something {
std::string m_member;
}; And a bindgen invocation like:
You get this (cleaned up a little bit): /* automatically generated by rust-bindgen 0.70.1 */
#[allow(non_snake_case, non_camel_case_types, non_upper_case_globals)]
pub mod root {
#[allow(unused_imports)]
use self::super::root;
pub mod std {
#[allow(unused_imports)]
use self::super::super::root;
pub use cxx::CxxString as string;
}
#[repr(C)]
pub struct Something {
pub m_member: root::std::string,
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
["Size of Something"][::std::mem::size_of::<Something>() - 32usize];
["Alignment of Something"]
[::std::mem::align_of::<Something>() - 8usize];
["Offset of field: Something::m_member"]
[::std::mem::offset_of!(Something, m_member) - 0usize];
};
} We use that extensively in Firefox FWIW. |
Thanks for the positive response! @emilio yep! The prelude and command-line which So that's why I'm first broaching this as a kind of "philosophy of bindgen" discussion: these hypothetical pluggable backends might make quite different decisions to what bindgen does now:
I guess one way forward would be for me to write a document or something along the lines of "if autocxx+cxx had been implemented as a bindgen plugin, what would it look like?" and then y'all can see how revolted you are :) Does that sound like a plan? I am unlikely to have time to actually reimplement autocxx (it's 44 KLOC, and I'm not sure that's the right move anyway) but perhaps such a document can move the discussion forward? Like you, I'm really interested in joining efforts to reduce duplication. |
I think this is fine and it might be an actual advantage as having a smaller surface of code that's actually accepted might make the whole task more approachable. I'm just shooting at the air here but maybe we could expose a limited subset of bindgen internals so we can write these backends as separate crates, where the existing backend is just one of these crates.
That sounds like a good way to kickstart the discussion 🚀 |
This is a major change to autocxx-bindgen which aims to improve maintainability by reducing divergence from upstream bindgen. Specifically, it: * Significantly reduces the textual diffence * Makes the remaining changes less invasive, such that merge conflicts should be easier to resolve * Redoes the changes such that they're more attuned to the current evolution of upstream bindgen, so it may be possible to upstream some or ideally all of these changes. (The ultimate goal is to unfork bindgen!) See google/autocxx#124 and rust-lang#2943 for the background here. Specifically this change: * Removes the #[bindgen_semantic_attributes] which were added to all sorts of items. Instead, * Much more is communicated via the existing ParseCallbacks mechansim. * In some cases, it's still necessary to annotate individual types - in this case we generate a newtype wrapper instead of attributes. This commit also re-enables the bindgen test suite. It does not yet add tests for all the above new functionality; that's yet to come.
An update here: since I wrote this, it seems that bindgen's |
This PR reports the memory layout information (size, alignment, packed-ness) for each struct into the ParseCallbacks. The use-case for this is fairly complex. Postprocessors such as autocxx currently do not necessarily pass all bindgen's output through: autocxx is opinionated in deciding what C++ types are "reasonable" to expose to Rust. For example types containing C++ strings aren't represented to Rust by-value. Instead, these types are represented as "opaque" types in the cxx sense (https://cxx.rs/extern-c++.html#opaque-c-types). As far as Rust is concerned, these types have no fields and can only be poked at using methods. In order to make these allocable on the stack, autocxx _does_ need to know the layout of these structs, so it can make an "opaque" fake struct in lieu of the real one. That's what is enabled by this callback. autocxx and other tools can post-process the bindgen output to replace bindgen's generated type with an opaque data-only struct like this: #[repr(align(16),packed)] pub struct TheType { _data: UnsafeCell<MaybeUninit<[u8;4]>> // etc. } // the actual struct is quite a bit more complex The extra information provided in the callback within this PR allows that. For completeness, there are multiple ways that the whole stack could be rearchitected to avoid the need for this. 1. We could teach bindgen all the rules for when a struct should be "opaque", then bindgen could generate it that way in the first place. This suggests major extensibility of bindgen to allow highly opinionated backends, along the lines of rust-lang#2943. 2. That decision could be delegated to the ParseCallbacks with a new fn make_struct_opaque(&self, some_struct_info) -> bool call. Although this sounds simple, the decisions required here are very complex and depend (for instance) upon all the different constructors, ability to represent template params, etc., so in practice this would require post-processors to run bindgen twice, once to gather all that information and then again to give truthful answers to that callback. 3. Post-processors such as autocxx could instead generate opaque types like this: #[repr(transparent)] pub struct TheType { _hidden_field: UnsafeCell<MaybeUninit<TheTypeAsGeneratedByBindgen>> } // it's more complex in reality or similar. This turns out to be very complex because TheTypeAsGeneratedByBindgen might include complex STL structures such as std::string which autocxx currently instructs bindgen to replace with a simpler type, precisely so it can avoid all this complexity. Any of these options are months of work either in bindgen or autocxx or both, so I'm hoping that it's OK to simply emit the layout information which bindgen already knows. Compatibility: this is a further compatibility break to the DiscoveredItem enum. However, it seems to be accepted that we're going to grow that enum over time. If the compatibility break is a concern this information could be reported via a separate callback. Part of google/autocxx#124
I was chatting with @pvdrz at the Rust for Linux conference about some bold ideas for
bindgen
and he thought I should write them up, so here goes.The problem space
cxx is a pretty popular C++ interop tool these days. Its biggest advantage is that Rust code manipulates C++ data using friendly vocabulary types such as
cxx::UniquePtr<T>
andcxx::CxxString
. These make FFI much safer in practice than bindgen-style bindings, because for example there's no need to track object lifetimes, nor string lengths, etc. Irrespective of the specifics of the policy relating to theunsafe
keyword, cxx-style interop is much less error prone in practice than bindgen-style FFI.However,
cxx
can't generate such bindings from pre-existing C++ headers. The full language boundary needs to be declared in an IDL (which looks a lot like Rust code but actually isn't quite).For years now, the Chromium project has wanted to combine the ergonomic advantages of cxx with bindgen-style generation from headers:
autocxx
, a project which literally connectsbindgen
intocxx
crubit
, a similar concept not reliant onbindgen
norcxx
The idea
The idea is: bindgen can output different "flavors" of bindings. Perhaps this is pluggable, perhaps there are multiple flavors built in.
std::unique_ptr
and substitutes them withcxx::UniquePtr
, generating both C++ and Rust side code to make this possible. This is basically whatautocxx
does but it's really pretty complicated since it's a post-processor, a lot of the complexity would drop away if it were built into bindgen itself.CppRef<T>
type for C++ references, etc.What this would require
It would require:
Simplest first step
All this sounds vague. A hypothetical first step could be a
--emit-cxx-crate-strings
which:const &std::string
. If a C++ function takes such a parameter,&cxx::CxxString
and then calls the real functionThis is just a small step, but users of both
cxx
andbindgen
could immediately benefit because they can create such strings in Rust.Simplest second step
The next step after that would be to support
std::string
by value, which would require either cunning to store self-referential types on the Rust stack or, more likely, wrapping thestd::string
in astd::unique_ptr
. This would require a small C++ shim function to be generated to unwrap it then pass it into the original C++ API.The text was updated successfully, but these errors were encountered: