Skip to content

Commit

Permalink
Include struct layout in callbacks.
Browse files Browse the repository at this point in the history
This PR reports the memory layout information (size, alignment, packed-ness)
for each struct into the ParseCallbacks.

The use-case for this is fairly complex. Postprocessors such as autocxx
currently do not necessarily pass all bindgen's output through: autocxx is
opinionated in deciding what C++ types are "reasonable" to expose to Rust. For
example types containing C++ strings aren't represented to Rust
by-value. Instead, these types are represented as "opaque" types in the cxx
sense (https://cxx.rs/extern-c++.html#opaque-c-types). As far as Rust is
concerned, these types have no fields and can only be poked at using methods.

In order to make these allocable on the stack, autocxx _does_ need to know
the layout of these structs, so it can make an "opaque" fake struct in lieu
of the real one. That's what is enabled by this callback. autocxx and other
tools can post-process the bindgen output to replace bindgen's generated
type with an opaque data-only struct like this:

  #[repr(align(16),packed)]
  pub struct TheType {
    _data: UnsafeCell<MaybeUninit<[u8;4]>> // etc.
  } // the actual struct is quite a bit more complex

The extra information provided in the callback within this PR allows that.

For completeness, there are multiple ways that the whole stack could
be rearchitected to avoid the need for this.

1. We could teach bindgen all the rules for when a struct should be
   "opaque", then bindgen could generate it that way in the first place.
   This suggests major extensibility of bindgen to allow highly
   opinionated backends, along the lines of #2943.

2. That decision could be delegated to the ParseCallbacks with a new
      fn make_struct_opaque(&self, some_struct_info) -> bool
   call. Although this sounds simple, the decisions required here are
   very complex and depend (for instance) upon all the different
   constructors, ability to represent template params, etc., so
   in practice this would require post-processors to run bindgen twice,
   once to gather all that information and then again to give
   truthful answers to that callback.

3. Post-processors such as autocxx could instead generate opaque types
   like this:
     #[repr(transparent)]
     pub struct TheType {
       _hidden_field:
         UnsafeCell<MaybeUninit<TheTypeAsGeneratedByBindgen>>
     } // it's more complex in reality
   or similar. This turns out to be very complex because
   TheTypeAsGeneratedByBindgen might include complex STL structures
   such as std::string which autocxx currently instructs bindgen to
   replace with a simpler type, precisely so it can avoid all this
   complexity.

Any of these options are months of work either in bindgen or autocxx
or both, so I'm hoping that it's OK to simply emit the layout
information which bindgen already knows.

Compatibility: this is a further compatibility break to the
DiscoveredItem enum. However, it seems to be accepted that we're going
to grow that enum over time. If the compatibility break is a concern
this information could be reported via a separate callback.

Part of google/autocxx#124
  • Loading branch information
adetaylor committed Feb 20, 2025
1 parent 0df4256 commit 27c8e5e
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 8 deletions.
30 changes: 27 additions & 3 deletions bindgen-tests/tests/parse_callbacks/item_discovery_callback/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ use std::rc::Rc;

use regex::Regex;

use bindgen::callbacks::{DiscoveredItem, DiscoveredItemId, ParseCallbacks};
use bindgen::callbacks::{
DiscoveredItem, DiscoveredItemId, Layout, ParseCallbacks,
};
use bindgen::Builder;

#[derive(Debug, Default)]
Expand Down Expand Up @@ -37,6 +39,11 @@ pub fn test_item_discovery_callback() {
DiscoveredItem::Struct {
original_name: Some("NamedStruct".to_string()),
final_name: "NamedStruct".to_string(),
layout: Some(Layout {
size: 0,
align: 1,
packed: false,
}),
},
),
(
Expand Down Expand Up @@ -78,6 +85,11 @@ pub fn test_item_discovery_callback() {
DiscoveredItem::Struct {
original_name: None,
final_name: "_bindgen_ty_*".to_string(),
layout: Some(Layout {
size: 0,
align: 1,
packed: false,
}),
},
),
(
Expand Down Expand Up @@ -122,7 +134,7 @@ fn compare_item_info(
) -> bool {
if std::mem::discriminant(expected_item) !=
std::mem::discriminant(generated_item)
{
{
return false;
}

Expand Down Expand Up @@ -160,6 +172,7 @@ pub fn compare_struct_info(
let DiscoveredItem::Struct {
original_name: expected_original_name,
final_name: expected_final_name,
layout: expected_layout,
} = expected_item
else {
unreachable!()
Expand All @@ -168,6 +181,7 @@ pub fn compare_struct_info(
let DiscoveredItem::Struct {
original_name: generated_original_name,
final_name: generated_final_name,
layout: generated_layout,
} = generated_item
else {
unreachable!()
Expand All @@ -177,12 +191,22 @@ pub fn compare_struct_info(
return false;
}

match (expected_original_name, generated_original_name) {
if !match (expected_original_name, generated_original_name) {
(None, None) => true,
(Some(expected_original_name), Some(generated_original_name)) => {
compare_names(expected_original_name, generated_original_name)
}
_ => false,
} {
return false;
}

match (expected_layout, generated_layout) {
(None, None) => true,
(Some(expected_layout), Some(actual_layout)) => {
expected_layout == actual_layout
}
_ => false,
}
}

Expand Down
5 changes: 5 additions & 0 deletions bindgen/callbacks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ pub use crate::ir::analysis::DeriveTrait;
pub use crate::ir::derive::CanDerive as ImplementsTrait;
pub use crate::ir::enum_ty::{EnumVariantCustomBehavior, EnumVariantValue};
pub use crate::ir::int::IntKind;
pub use crate::ir::layout::Layout;
use std::fmt;

/// An enum to allow ignoring parsing of macros.
Expand Down Expand Up @@ -192,6 +193,10 @@ pub enum DiscoveredItem {

/// The name of the generated binding
final_name: String,

/// The layout of the structure in memory (size, alignment, etc.) if
/// known.
layout: Option<Layout>,
},

/// Represents a union with its original name in C and its generated binding name
Expand Down
1 change: 1 addition & 0 deletions bindgen/codegen/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2490,6 +2490,7 @@ impl CodeGenerator for CompInfo {
.name()
.map(String::from),
final_name: canonical_ident.to_string(),
layout,
},
CompKind::Union => DiscoveredItem::Union {
original_name: item
Expand Down
10 changes: 5 additions & 5 deletions bindgen/ir/layout.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@ use crate::ir::context::BindgenContext;
use std::cmp;

/// A type that represents the struct layout of a type.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) struct Layout {
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Ord, PartialOrd)]
pub struct Layout {
/// The size (in bytes) of this layout.
pub(crate) size: usize,
pub size: usize,
/// The alignment (in bytes) of this layout.
pub(crate) align: usize,
pub align: usize,
/// Whether this layout's members are packed or not.
pub(crate) packed: bool,
pub packed: bool,
}

#[test]
Expand Down

0 comments on commit 27c8e5e

Please sign in to comment.