Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoharness Subcommand #3874

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

carolynzech
Copy link
Contributor

@carolynzech carolynzech commented Feb 4, 2025

This PR introduces an initial implementation for the autoharness subcommand and a book chapter describing the feature. I recommend reading the book chapter before reviewing the code (or proceeding further in this PR description).

Callouts

--harness is to manual harnesses what --include-function and --exclude-function are to autoharness; both allow the user to filter which harnesses/functions get verified. Their implementation is different, however----harness is a driver-only flag, i.e., we still compile every harness, but then only call CBMC on the ones specified in --harness. --include-function and --exclude-function get passed to the compiler. I made this design choice to try to be more aggressive about improving compiler performance--if a user only asks to verify one function and we go try to codegen a thousand, the compiler will take much longer than it needs to. I thought this more aggressive optimization was warranted given that crates are likely to have far many more functions eligible for autoverification than manual Kani harnesses.

(See also the limitations in the book chapter).

Testing

Besides the new tests in this PR, I also ran against s2n-quic to confirm that the subcommand works on larger crates.

Towards #3832

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

We approximated an empty body by checking for TerminatorKind::Return in the first basic block, but it turns out some identity functions have this MIR as well, so remove it. One could argue that identity functions aren't worth checking either, but we'd rather check more than less for now.
@github-actions github-actions bot added the Z-BenchCI Tag a PR to run benchmark CI label Feb 4, 2025
@tautschnig
Copy link
Member

I think we'll have to find a way to report which functions were not considered, or else this creates a soundness risk: a code base may be fully covered by Kani in one version of the code base, then a change is made such that Arbitrary is no longer derived for some parameter type (or a new parameter with a type not supporting Arbitrary is added). At that point a previously fully-covered code base turns into one with partial coverage with no obvious way for the user to detect this.

@carolynzech
Copy link
Contributor Author

carolynzech commented Feb 4, 2025

I think we'll have to find a way to report which functions were not considered, or else this creates a soundness risk: a code base may be fully covered by Kani in one version of the code base, then a change is made such that Arbitrary is no longer derived for some parameter type (or a new parameter with a type not supporting Arbitrary is added). At that point a previously fully-covered code base turns into one with partial coverage with no obvious way for the user to detect this.

@tautschnig It does currently print every function that it did consider, so one could figure out which functions weren't considering by doing a (mental) set difference between all the functions in their crate versus what got printed. So we're not claiming to verify anything that we haven't; it's still on a function-by-function basis, and we print all the functions we considered.
But your point is well taken that changes over time may be difficult to spot. I can explicitly print all the functions we skipped. It's not a fundamental limitation--it just involves saving some more compiler-side metadata for the driver to interpret. This PR is already quite large, so I was trying to avoid introducing too many changes, but if we're uncomfortable merging this PR without it, happy to add it.

@carolynzech carolynzech marked this pull request as ready for review February 4, 2025 21:02
@carolynzech carolynzech requested a review from a team as a code owner February 4, 2025 21:02
Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high-level comment: the command name, "autoverify" might give a false impression that the functions that were covered have been verified, even though the harnesses only check the absence of panics and undefined behavior (at least for functions that do not have contracts), whereas "verified" is associated with proving their functional correctness. The same applies to the description in the new chapter.

I suggest we use a different name, e.g. "check-panic-ub" or just "auto"?

docs/src/reference/experimental/autoverify.md Outdated Show resolved Hide resolved
@carolynzech
Copy link
Contributor Author

carolynzech commented Feb 5, 2025

whereas "verified" is associated with proving their functional correctness. The same applies to the description in the new chapter.

@zhassan-aws Point taken about the chapter wording--I can make it clearer that this since this subcommand generates a harness under the hood, it has the same guarantees as a harness, so users shouldn't take it to mean that they're magically getting anything stronger (i.e., functional correctness) by using it.

I'm not sure I agree with the broader point about the "autoverify" name, though--our printed CBMC output refers to "verification results," so it seems like this name is consistent with that convention. That being said, I don't want to bikeshed this too much, so I'll think about some names... I don't love check-panic-ub because we can also check for overflow, check-panic-ub-overflow is too long, and if we ever introduce an additional default check the name will be out of date. Perhaps kani autoharness to hammer home that we're just generating a MIR harness in the end?

@zhassan-aws
Copy link
Contributor

Perhaps kani autoharness to hammer home that we're just generating a MIR harness in the end?

I like this better. kani already implies that we're running verification, and autoharness suggests that the harnesses checked are automatically generated.

@carolynzech carolynzech changed the title Autoverify Subcommand Autoharness Subcommand Feb 6, 2025
@tautschnig
Copy link
Member

But your point is well taken that changes over time may be difficult to spot. I can explicitly print all the functions we skipped. It's not a fundamental limitation--it just involves saving some more compiler-side metadata for the driver to interpret. This PR is already quite large, so I was trying to avoid introducing too many changes, but if we're uncomfortable merging this PR without it, happy to add it.

Having changed the name to autoharness I am less worried, but I'd still like to see a way to log all skipped functions eventually. So I'd appreciate follow-up work towards this. I'd then even suggest more fine-grained information that tells the user why a particular function was skipped.

@carolynzech
Copy link
Contributor Author

But your point is well taken that changes over time may be difficult to spot. I can explicitly print all the functions we skipped. It's not a fundamental limitation--it just involves saving some more compiler-side metadata for the driver to interpret. This PR is already quite large, so I was trying to avoid introducing too many changes, but if we're uncomfortable merging this PR without it, happy to add it.

Having changed the name to autoharness I am less worried, but I'd still like to see a way to log all skipped functions eventually. So I'd appreciate follow-up work towards this. I'd then even suggest more fine-grained information that tells the user why a particular function was skipped.

Sounds good. I can & will create a follow-up PR with UX improvements... I think getting this out and trying to apply to a library as big as the standard library will give us a lot of good ideas about things that can be improved.

@carolynzech carolynzech enabled auto-merge February 6, 2025 15:41
@carolynzech carolynzech disabled auto-merge February 6, 2025 16:12
Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far! I mostly reviewed the tests in this round.

I suggest adding a test with dependencies to ensure we don't generate harnesses for functions outside the local crate.

Also, on the question of not terminating, I think we should run verification with a small default unwind value and a small default timeout and allow the user to control those.

kani-compiler/src/args.rs Outdated Show resolved Hide resolved
Verification failed for - oob_unsafe_array_access
Verification failed for - oob_safe_array_access
Verification failed for - unchecked_mul
Complete - 0 successfully verified functions, 5 failures, 5 total.
Copy link
Contributor

@zhassan-aws zhassan-aws Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the output here. Does Kani run two times?

Copy link
Contributor Author

@carolynzech carolynzech Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it runs the automatic harnesses, then the manual harnesses. The purpose of this test is to check that automatic harnesses behave the same as manual ones, so we'd expect the output to look similar. Note that the automatic harness output talks about failed "functions" whereas the manual output talks about failed "harnesses."
The behavior of this subcommand is to run the automatic harnesses and the manual ones. The idea is that if you run autoharness, that's enough; you don't need to go run Kani again afterward to do the rest of your proofs.

See the comment at the top of the Rust file for this test:

// Test the bodies of the automatically generated harnesses:
// do they catch the same failures as manual ones?
// Note that this also indirectly tests that the automatic harness
// subcommand also runs the manual harnesses in the crate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent about also running the manual harnesses--I feel like this can fall into the bucket of "UX decision we'll revisit once we've tried using it and form opinions." Hard to know what customers will prefer w/o data.

Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that in the future, we would want to introduce a #[kani::autoharness] attribute that functions can be annotated with (or just allow #[kani::proof] directly on functions with arguments, see #1919). Perhaps, autoharness would add those annotations automatically.

pub autoharness_included_functions: Vec<String>,
/// If we are running the autoharness subcommand, the functions to exclude from autoverification
#[arg(long = "autoharness-exclude-function", num_args(1))]
pub autoharness_excluded_functions: Vec<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option we can consider adding (perhaps in a follow-up PR) is one to only apply autoharness to public functions in a crate. Non-public functions often have pre-conditions, and may fail without constraints on the symbolic values.

@carolynzech
Copy link
Contributor Author

I imagine that in the future, we would want to introduce a #[kani::autoharness] attribute that functions can be annotated with (or just allow #[kani::proof] directly on functions with arguments, see #1919). Perhaps, autoharness would add those annotations automatically.

You mean that we'd phase out --include-functions and --exclude-functions and just have people put this annotation on their functions instead (or support both)?

@zhassan-aws
Copy link
Contributor

What I was thinking is that functions annotated with #[kani::autoharness] would be targeted when we run cargo kani. The new subcommand cargo kani autoharness would target functions that do not have harness annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Z-BenchCI Tag a PR to run benchmark CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants