-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoharness Subcommand #3874
base: main
Are you sure you want to change the base?
Autoharness Subcommand #3874
Conversation
We approximated an empty body by checking for TerminatorKind::Return in the first basic block, but it turns out some identity functions have this MIR as well, so remove it. One could argue that identity functions aren't worth checking either, but we'd rather check more than less for now.
8b85bee
to
f814676
Compare
f814676
to
bfebfdb
Compare
I think we'll have to find a way to report which functions were not considered, or else this creates a soundness risk: a code base may be fully covered by Kani in one version of the code base, then a change is made such that |
@tautschnig It does currently print every function that it did consider, so one could figure out which functions weren't considering by doing a (mental) set difference between all the functions in their crate versus what got printed. So we're not claiming to verify anything that we haven't; it's still on a function-by-function basis, and we print all the functions we considered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high-level comment: the command name, "autoverify" might give a false impression that the functions that were covered have been verified, even though the harnesses only check the absence of panics and undefined behavior (at least for functions that do not have contracts), whereas "verified" is associated with proving their functional correctness. The same applies to the description in the new chapter.
I suggest we use a different name, e.g. "check-panic-ub" or just "auto"?
@zhassan-aws Point taken about the chapter wording--I can make it clearer that this since this subcommand generates a harness under the hood, it has the same guarantees as a harness, so users shouldn't take it to mean that they're magically getting anything stronger (i.e., functional correctness) by using it. I'm not sure I agree with the broader point about the "autoverify" name, though--our printed CBMC output refers to "verification results," so it seems like this name is consistent with that convention. That being said, I don't want to bikeshed this too much, so I'll think about some names... I don't love |
I like this better. |
Having changed the name to |
Sounds good. I can & will create a follow-up PR with UX improvements... I think getting this out and trying to apply to a library as big as the standard library will give us a lot of good ideas about things that can be improved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far! I mostly reviewed the tests in this round.
I suggest adding a test with dependencies to ensure we don't generate harnesses for functions outside the local crate.
Also, on the question of not terminating, I think we should run verification with a small default unwind value and a small default timeout and allow the user to control those.
tests/script-based-pre/cargo_autoharness_harnesses_fail/harnesses_fail.expected
Outdated
Show resolved
Hide resolved
Verification failed for - oob_unsafe_array_access | ||
Verification failed for - oob_safe_array_access | ||
Verification failed for - unchecked_mul | ||
Complete - 0 successfully verified functions, 5 failures, 5 total. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about the output here. Does Kani run two times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it runs the automatic harnesses, then the manual harnesses. The purpose of this test is to check that automatic harnesses behave the same as manual ones, so we'd expect the output to look similar. Note that the automatic harness output talks about failed "functions" whereas the manual output talks about failed "harnesses."
The behavior of this subcommand is to run the automatic harnesses and the manual ones. The idea is that if you run autoharness, that's enough; you don't need to go run Kani again afterward to do the rest of your proofs.
See the comment at the top of the Rust file for this test:
// Test the bodies of the automatically generated harnesses:
// do they catch the same failures as manual ones?
// Note that this also indirectly tests that the automatic harness
// subcommand also runs the manual harnesses in the crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ambivalent about also running the manual harnesses--I feel like this can fall into the bucket of "UX decision we'll revisit once we've tried using it and form opinions." Hard to know what customers will prefer w/o data.
tests/script-based-pre/cargo_autoharness_loops_fixme/config.yml
Outdated
Show resolved
Hide resolved
tests/script-based-pre/cargo_autoharness_loops_fixme/config.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that in the future, we would want to introduce a #[kani::autoharness]
attribute that functions can be annotated with (or just allow #[kani::proof]
directly on functions with arguments, see #1919). Perhaps, autoharness would add those annotations automatically.
pub autoharness_included_functions: Vec<String>, | ||
/// If we are running the autoharness subcommand, the functions to exclude from autoverification | ||
#[arg(long = "autoharness-exclude-function", num_args(1))] | ||
pub autoharness_excluded_functions: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option we can consider adding (perhaps in a follow-up PR) is one to only apply autoharness to public functions in a crate. Non-public functions often have pre-conditions, and may fail without constraints on the symbolic values.
You mean that we'd phase out |
What I was thinking is that functions annotated with |
This PR introduces an initial implementation for the
autoharness
subcommand and a book chapter describing the feature. I recommend reading the book chapter before reviewing the code (or proceeding further in this PR description).Callouts
--harness
is to manual harnesses what--include-function
and--exclude-function
are to autoharness; both allow the user to filter which harnesses/functions get verified. Their implementation is different, however----harness
is a driver-only flag, i.e., we still compile every harness, but then only call CBMC on the ones specified in--harness
.--include-function
and--exclude-function
get passed to the compiler. I made this design choice to try to be more aggressive about improving compiler performance--if a user only asks to verify one function and we go try to codegen a thousand, the compiler will take much longer than it needs to. I thought this more aggressive optimization was warranted given that crates are likely to have far many more functions eligible for autoverification than manual Kani harnesses.(See also the limitations in the book chapter).
Testing
Besides the new tests in this PR, I also ran against s2n-quic to confirm that the subcommand works on larger crates.
Towards #3832
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.