Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract naming stuff #520

Open
adetaylor opened this issue May 23, 2021 · 4 comments
Open

Abstract naming stuff #520

adetaylor opened this issue May 23, 2021 · 4 comments
Labels
architectural Architectural improvement good first issue Good for newcomers

Comments

@adetaylor
Copy link
Collaborator

We should deal with names for functions, types and typedefs in a similar way, so that things like #494 are fixed for all of them. We should see what we can make uniform for the different analysis phases, maybe extracting to a new phase.

@adetaylor
Copy link
Collaborator Author

The final piece of the puzzle here is to create newtype wrappers for all the following.

|| Not cxx safe || Cxx safe ||
| Ident | CxxSafeIdent |
| String | CxxSafeString |
| QualifiedName | CxxSafeQualifiedName |
| Namespace | CxxSafeNamespace |

then arrange for fun/mod.rs to return a CxxSafeIdent in its cxxbridge_name function.

@adetaylor
Copy link
Collaborator Author

It's a little orthogonal to this issue, but we may also be able to use this callback to track original names, thus unforking bindgen a little.

adetaylor added a commit that referenced this issue Jan 29, 2025
This is an internal change which begins to dig us out of the mess of naming we
have inside autocxx. We deal with all sorts of different kinds of identifiers -
for Rust, for cxx, for C++, from bindgen - and convert between them in a fairly
ad-hoc basis. It's all too fragile to be able to tackle changes like #1371.

Fortunately, Rust makes it easy to solve these sorts of messes using newtype
wrappers. Unfortunately, I've tried that in the past and it's got to be a
huge muddle.

Here's another attempt at carving off part of the space. This introduces
such wrappers for two types of name:
* The [bindgen_original_name] identifiers discovered in bindgen
  attributes;
* The final name to be used for C++ function calls.

There are various FIXMEs added in locations where we're converting to or from
other types of name in questionable ways.

A bit of #520.
adetaylor added a commit that referenced this issue Jan 30, 2025
This is an internal change which begins to dig us out of the mess of naming we
have inside autocxx. We deal with all sorts of different kinds of identifiers -
for Rust, for cxx, for C++, from bindgen - and convert between them in a fairly
ad-hoc basis. It's all too fragile to be able to tackle changes like #1371.

Fortunately, Rust makes it easy to solve these sorts of messes using newtype
wrappers. Unfortunately, I've tried that in the past and it's got to be a
huge muddle.

Here's another attempt at carving off part of the space. This introduces
such wrappers for two types of name:
* The [bindgen_original_name] identifiers discovered in bindgen
  attributes;
* The final name to be used for C++ function calls.

There are various FIXMEs added in locations where we're converting to or from
other types of name in questionable ways.

A bit of #520.
@adetaylor
Copy link
Collaborator Author

I started to do some newtype wrappers in #1431. There's lots more to do.

This was referenced Feb 10, 2025
@adetaylor
Copy link
Collaborator Author

adetaylor commented Feb 24, 2025

Here's a writeup of the current known status of name handling in autocxx. I consider this probably the worst area of technical debt within autocxx, aside from its reliance on a forked version of bindgen which is being tackled in #124.

What's complicated about naming?

autocxx has to deal with many different kinds of names.

  • Within bindgen output, within a hierarchical namespace of mods corresponding to C++ namespaces:
    • The name given by bindgen
    • The original name in C++ (which may differ)
    • The name within the namespace (which may differ again, if the type is nested further within another type)
  • The name we give to cxx (which is not hierarchic; everything must have a unique name)
  • The name we expose in Rust (which probably points to the cxx item, but should be named more like the bindgen Rust name)

In addition,

  • When we and bindgen handle overloads, a single C++ name may have multiple names throughout the Rust stack
  • Unlike bindgen, we expose Rust functions and types to C++, so we need to generate C++ names from Rust names
  • With our subclass handling, there will be calls from C++ to Rust back to C++, so it's kind of a perfect storm

There are myriad opportunities for getting this wrong, and I'm sure there are dozens of bugs hiding in all this.

What should the basic strategy be?

Overall, with the benefit of hindsight:

  • For an existing C++ thing, the eventual name we expose in Rust should (usually?) match the name bindgen generates for use in Rust.
  • We may generate a different name for use in the cxx bridge, but the user probably doesn't need to know that.
  • Each of these different kinds of names needs to be wrapped in a newtype wrapper inside autocxx to avoid confusion. Newtype wrappers for some names. #1431 made some good progress here.
  • At each point of conversion from one kind of name to another, invariants should be applied. e.g. it shouldn't be possible to generate a name for the cxx bridge without ensuring that the name is unique, and that it doesn't conflict with built-in cxx symbols (UniquePtr etc.)
  • Where we do fancy things to names, e.g. uniquify them, we should do it consistently for all the different kinds of APIs.

What needs doing here

  • All the kinds of names need to be wrapped in newtype wrappers. Newtype wrappers for some names. #1431 did half the job.
  • Invariants need to be applied on creation of each kind of name. This hasn't yet been done.
  • Currently, FnAnalyzer::analyze_functions does a lot of the work to convert between these kinds of names for functions only. It avoids various conflicts and problems. None of that logic is applied to other kinds of APIs, e.g. types. We should try to abstract its logic and do that. (This is, broadly, We do not uniqueify cxx::bridge names except for functions #486).
  • None of the above should really change autocxx's current behavior except it should make us better at handling identically-named types in multiple namespaces (for which we should add any missing testcases)
  • Once we've done that, we should take a hard look at cases where the autocxx-exposed name differs from the bindgen-exposed name, and see if we can undo such cases.

PRs very welcome for any part of this. Some of these steps will be quite easy so tagging this as good-first-issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architectural Architectural improvement good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant