Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse rustc target names #1413

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

madsmtm
Copy link
Collaborator

@madsmtm madsmtm commented Feb 24, 2025

Parse target names instead of using a pre-generated list. Allows target names like those used by certain Linux distributions, even when compiling outside build.rs.

I feel fairly confident that the target parsing is complete, and that the parsing approach taken (described in the code, but roughly "parse arch, then env/ABI, then OS, and lastly vendor") is correct.
Unfortunately, since this information is now parsed instead of pre-generated, we cannot just update the information automatically with a PR. Instead, I've added a CI step that opens an issue comment in #1426 whenever a nightly rustc target names fails to parse correctly. This isn't as nice, but it's at least workable.

Note that ensuring we construct the same LLVM triples as rustc is a lot more work; I've started on it a little bit, but it will have to be postponed (it probably also requires a lot of tweaks to rustc to really be nice). So for now, I opted to keep a pre-generated list of rustc to LLVM triples.

Fixes #1297.
Fixes #1317.
Fixes #1405.
Fixes #1407.

But keep using a generated LLVM/Clang triple mapping for now.
@NobodyXu
Copy link
Collaborator

Hmmm the solution here is more complex than I'd imagined.

I was thinking of simply splitting target, and replace alpine/* with unknown if os is linux

@madsmtm
Copy link
Collaborator Author

madsmtm commented Feb 28, 2025

I was thinking of simply splitting target, and replace alpine/* with unknown if os is linux

Yeah, and that approach would solve the Alpine/Chimera issues, but won't solve the problem people have otherwise been having regarding Rust bootstrapping and custom/new targets, see rust-lang/rust#137064 / rust-lang/rust#137460 (bootstrap has downgraded to cc v1.1.22 because of this, which breaks a lot of other things).

Personally, I'm still a little bit on the fence about this, it'd be really nice if we could just only support upstream targets. But ultimately, I don't think that's a good decision for the Rust project as a whole.

@madsmtm
Copy link
Collaborator Author

madsmtm commented Feb 28, 2025

WDYT @ChrisDenton? I get the feeling you're more in touch with other rustc devs than I, what's your general take?

@madsmtm
Copy link
Collaborator Author

madsmtm commented Feb 28, 2025

For completeness, I'll note that there are other options, I just dislike them even more than this:

  • Bootstrap sets TARGET, CARGO_CFG_TARGET_ARCH, CARGO_CFG_TARGET_VENDOR, CARGO_CFG_TARGET_OS, CARGO_CFG_TARGET_ENV and CARGO_CFG_TARGET_ABI before invoking cc.
  • We use the pre-generated list, and fall back to very simple "split into three/four parts, and assume arch-vendor-os/arch-vendor-os-env" if not available there.

@NobodyXu
Copy link
Collaborator

but won't solve the problem people have otherwise been having regarding Rust bootstrapping and custom/new targets, see rust-lang/rust#137064 / rust-lang/rust#137460 (bootstrap has downgraded to cc v1.1.22 because of this, which breaks a lot of other things).

I got it, but it feels like this PR has hard coded so many stuff and make maintaining harder

I still prefer some of the information to be generated from rustc or somewhere else instead of hardcoding everything, maybe we can modify gen-target-info to check env/abi etc for non-straight forward mappings and put them in a sorted array?

@ChrisDenton
Copy link
Member

From a rustc perspective, we do want cc-rs to work with new/custom targets as well as possible. Though that doesn't necessarily mean accepting any arbitrary target string. There could be at least some rules if it helps.

Also there are perhaps some optimizations here. How much of the target info do we actually use? I suspect we only care about the targets we have special cases for, e.g. targert_os=="windows" etc.

Discerning the llvm target is a bit trickier. Though I think in part it's made worse by rustc itself not always being consistent (or even accurate, particularly for tier 3 targets). That is something that could be improved on the rustc side.

I'd also add that llvm deals with parts of the triple it doesn't understand by replacing them with "unknown". I'm not sure if we can/should take advantage of that but I thought I should note it.

madsmtm and others added 2 commits February 28, 2025 11:48
Co-authored-by: Jiahao XU <30436523+NobodyXu@users.noreply.github.com>
Co-authored-by: Jiahao XU <30436523+NobodyXu@users.noreply.github.com>
@madsmtm
Copy link
Collaborator Author

madsmtm commented Feb 28, 2025

I got it, but it feels like this PR has hard coded so many stuff and make maintaining harder

I don't like it either, but it really seems like the only option to me.

I still prefer some of the information to be generated from rustc or somewhere else instead of hardcoding everything, maybe we can modify gen-target-info to check env/abi etc for non-straight forward mappings and put them in a sorted array?

Hmm, not a bad idea, though it wouldn't help to reduce the majority of the work here, it would only really help for this part of the parsing (e.g. parse_arch is still necessary, as is parse_envabi).

The bigger problem is that "non-straight forward mappings" is hard to define, since the mapping isn't defined at all!

From a rustc perspective, we do want cc-rs to work with new/custom targets as well as possible. Though that doesn't necessarily mean accepting any arbitrary target string. There could be at least some rules if it helps.

I doubt cc-rs has the sway to say no to a proposed target name, that's the purview of t-compiler, no? But yes, I'd really like some rules for target names!

Also there are perhaps some optimizations here. How much of the target info do we actually use? I suspect we only care about the targets we have special cases for, e.g. targert_os=="windows" etc.

Probably true, I think target_env isn't really used for anything other than on Windows (msvc/gnu) and for musl atm. But we'll likely need all the information to generate the right LLVM triple, and I suspect it'll be useful for the CROSS_COMPILE prefix too in the future.

Discerning the llvm target is a bit trickier. Though I think in part it's made worse by rustc itself not always being consistent (or even accurate, particularly for tier 3 targets). That is something that could be improved on the rustc side.

Indeed, and I plan to do that in the future (but wanted to land this PR first).

@ChrisDenton
Copy link
Member

I doubt cc-rs has the sway to say no to a proposed target name, that's the purview of t-compiler, no? But yes, I'd really like some rules for target names!

It has as much sway as the rest of the project, The best thing would be to make an MCP (or maybe it'd require an RFC) to specify some constraints we'd want or otherwise define how to choose target tuples for new targets. Otherwise it would be a case of monitoring new target MCPs and raising objections if their naming isn't great (which would be more hassle at the end of the day).

@madsmtm
Copy link
Collaborator Author

madsmtm commented Mar 1, 2025

CC @workingjubilee, you've been fairly involved with rustc target names. Could you verify my assumptions about the target names (and if you have time, look over the parsing code)?

Core assumptions about rustc target names:

  • Each target name has 2, 3 or 4 components.
  • 2-components names are of the format arch-os.
  • 3-components names are of either the format arch-vendor-os or arch-os-env+abi.
    • To differentiate these, we inspect the last component, and try to parse the env/ABI - and if that fails, assume that the target is of the format arch-vendor-os.
    • I guess we could try to parse the OS instead, but the list of OSes is long and may be changed by custom toolchains, so that seems more likely to cause problems? Besides, we already have logic to split the env/ABI, might as well re-use it for this.
  • 4-components names are of the format arch-vendor-os-env+abi.

@madsmtm madsmtm changed the title Parse target triples Parse rustc target names Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants