Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Documentation on Optimal Thread and Job Allocation in Cargo Mutants #187

Closed
ASuciuX opened this issue Dec 15, 2023 · 9 comments
Labels
documentation Improvements or additions to documentation

Comments

@ASuciuX
Copy link
Contributor

ASuciuX commented Dec 15, 2023

There appears to be a lack of documentation on the optimal usage of cargo mutants with respect to specifying the number of threads and jobs. Two critical aspects seem to be missing:

  1. A guideline on the ratio between the number of threads and jobs. It's not clear when to allocate more jobs versus more threads, and in what scenarios to prioritize threads over jobs for certain types of builds or tests.

  2. Recommendations based on CPU/RAM limits: Guidelines on determining the appropriate number of jobs and threads tailored to specific system specifications would be beneficial, ensuring that the tests run at full capacity without overloading the system.


Additionally, I'd like to say thank you for all the features in cargo-mutants and the ongoing development efforts.

@sourcefrog
Copy link
Owner

Hi, thanks for the suggestion and appreciation.

Just checking, did you already read https://mutants.rs/parallelism.html?

A guideline on the ratio between the number of threads and jobs. It's not clear when to allocate more jobs versus more threads, and in what scenarios to prioritize threads over jobs for certain types of builds or tests.

Can you clarify what you mean by "allocating threads" here? There is really only one knob at the moment, -j to set the number of jobs.

Recommendations based on CPU/RAM limits: Guidelines on determining the appropriate number of jobs and threads tailored to specific system specifications would be beneficial, ensuring that the tests run at full capacity without overloading the system.

I'd like to give recommendations on that, but it seems to really depend on how resource-hungry your test suite is, so it's hard to generalize. In the extreme case perhaps even one copy of the test suite is as much as will fit on a small laptop or VM.

At the moment the docs say

-j 4 may be a good starting point. Start there and watch memory and CPU usage, and tune towards a setting where all cores are fully utilized without apparent thrashing, memory exhaustion, or thermal issues.

In fact I now think even -j4 might be too high in some situations, and perhaps I'd revise it to 2 or 3.

I would welcome any suggestions.

@sourcefrog sourcefrog added the documentation Improvements or additions to documentation label Dec 15, 2023
@ASuciuX
Copy link
Contributor Author

ASuciuX commented Dec 15, 2023

By threads, I wanted to refer to test-threads and I've seen it in the docs but also that these don't interact great together. Link to specific doc sentence

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Dec 15, 2023

I'm keenly interested in optimizing the execution of cargo mutants for this specific package: stackslib on the develop branch of the stacks-core repository. With the latest version of mutants, it's generating around 11.5k mutants. My current setup involves a 32vCPU and 64GB RAM configuration, where I've been running it with cargo mutants --package stackslib -j 24 -- -- --test-threads 2. However, increasing these parameters seems to cause the terminal to crash.

If possible, I'm open to upgrading the specs, but my main goal is to find the most efficient setup to significantly reduce the processing time from weeks or months to just hours. Any guidance on the ideal specifications and command adjustments for this task would be greatly appreciated.

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Dec 15, 2023

In addition to my previous query, I would also like to understand more about how cargo mutants utilizes CPU and memory resources, particularly in scenarios where there are spikes to 100% CPU and memory usage. I've experienced this firsthand both on the mentioned setup and locally on a macOS M1 with 32GB RAM. In my local case, I started the process, stepped away for a while, and returned to find the warning pop up about low memory, despite the system monitor showing only around 200MB of RAM usage, a great contrast to the 30GB it was consuming earlier, as it remained open from that spike moment.

Could you provide insights into how cargo mutants manages memory and CPU during its execution? Specifically, does it cache builds/tests in memory for each job? How exactly does it allocate and utilize CPU resources for these processes? Understanding this could help in fine-tuning the system to handle the workload more efficiently, avoiding extreme resource consumption and system crashes.

@sourcefrog
Copy link
Owner

sourcefrog commented Dec 15, 2023

Starting with the concrete questions: -j24 is way too high unless you have an enormous VM (which you don't.) I would start at just -j2 and drop the --test-threads option. Your terminal is probably crashing because the machine's memory is exhausted.

I should probably make the manual more insistent about not setting it too high, even give a warning in the UI if -j is too high.

Mentioning --test-threads is just an example of passing an argument to the test binary and not a suggestion that you should do it. Maybe it's a distracting example that should be removed.

  • Don't use --test-threads as an example
  • Warn if -j is too high (more than say 6)
  • Caution in the docs against setting -j too high

I see that package is pretty large (220ksloc), so also to just concretely get started you might want to use -f and other filter options to restrict your experiments to some files or directories, at least while you get started. In the longer term, you could think about splitting that one crate into several, which I guess is not worth it for cargo-mutants alone but perhaps would also help with CI and developer velocity generally.

If you are running this on a mac or Windows it's worth excluding your terminal or IDE from gatekeeper controls which can apparently have a pretty large effect: see #189.

I ran the tests from that package and it looks like they take about 6 minutes total under nextest, with some individual tests taking >60s.

I also saw some test failures with

called `Result::unwrap()` on an `Err` value: ClarityError(Interpreter(Interpreter(MarfFailure("Too many open files (os error 24

which, maybe is a bug, or if these tests really need a lot of open files perhaps it's a great example of where you might need to pass --test-threads to the test binary to avoid resource exhaustion.

There are some other flakes or failures too, which I guess is going to reduce my ability or enthusiasm to use this crate as a test case. If they are flaking in your environment then the mutants output probably won't be reliable either.

I realize those slow tests might be important and hard to speed up but if you can, that would probably be the thing that helps most. Or, if these slow tests are more of a defense in depth and not relied on to catch every mutant or needing to be run on every commit, you might think about marking them #[ignored] and only running them from CI.

As a ballpark in a tree that generates 11434 mutants and that has a test suite that takes 6 minutes I'd expect it would take at most 11434*6m = 48 days 😱 to test them all, ignoring the time for the incremental builds. Actually, some will be unviable, which will cut that time maybe in half, but it's still a lot, and I'd like to bring it down.

In general I think Rust builds and tests are faster on Linux, and it's easier to use a ramdisk, so running in a beefy Linux VM would be one easy way to speed it up. I have an idea to allow sharding across multiple VMs, too, #192, so then you can get answers faster by spending money; this should parallelize almost perfectly to the point where 10k VMs might test everything in 10-20 minutes on this tree.


More generally:

The cargo-mutants process itself should be using very little memory and CPU. Of course there might be bugs, but structurally it's not doing very much other than parsing the source once, writing out updates, and spawning other programs.

99+% of the resource usage comes from it running cargo build and cargo test once for every one of the many mutants that it generates. The resource consumption, I would expect, would be quite similar to as if you had a shell script just repeatedly editing one file, running a build, and then a test. Or, if you give -j, that many shell scripts running in parallel.

So it's not really managing or allocating memory or CPU, any more than that shell script would be. Both these subprocesses are somewhat black boxes for cargo-mutants: it can't know a priori how much your tests will use when they run, or control how much RAM and CPU they use.

Both the build tool and your own tests can be very hungry for CPU and RAM, and very spiky in how they use it. The spikes might account for your computer popping up a warning but then looking fine by the time you look at it.

Both the build and test are typically trying to schedule many internal tasks in parallel, but they commonly go through phases where there is just one straggler using only one core. The point of -j in cargo-mutants is to avoid idling the other cores during those periods.

Specifically, does it cache builds/tests in memory for each job?

Not really, it runs cargo build or test separately for each job. The cargo build will be an incremental build, after the first one, so it's not doing everything from scratch.

It would be amazing if we could work out which tests possibly need to be run for any code change, perhaps by something like call graph analysis. It might have to look beyond direct reachability to catch changes due to side effects. But, if we could see that a particular test can never possibly execute a certain bit of code, then we could skip running that test. It seems possible in theory but I don't know yet how to get the information. The pragmatic answer is smaller crates which each have their own test coverage.

On Linux it's good to put the tempdir in a ramdisk, which I guess you can see as a kind of caching.

The idea we discussed elsewhere of running tests one at a time rather than through cargo test is interesting; not only would it let us stop when something fails, but it might let us prioritize the tests most likely to fail.

  • Move some of this explanation into the docs

@sourcefrog
Copy link
Owner

sourcefrog commented Dec 15, 2023

For what it's worth I did get it running on that tree and it found some missed mutants. It is certainly projecting to take a long time. Your tree seems to need --test-threads 1 to pass on my mac, which is probably not helping performance.

; ~/src/mutants/target/release/cargo-mutants mutants -vV -d ~/src/stacks-core/stackslib/ -- -- --test-threads 1
Found 11434 mutants to test
ok         Unmutated baseline in 213.9s build + 1048.0s test
Auto-set test timeout to 5239.8s
unviable   stackslib/src/burnchains/bitcoin/spv.rs:228:9: replace SpvClient::tx_begin -> Result<DBTx<'a>, btc_error> with Ok(Default::default()) in 4.9s build
unviable   stackslib/src/net/relay.rs:1349:13: replace || with != in Relayer::process_mined_problematic_blocks in 4.0s build
unviable   stackslib/src/net/api/postmempoolquery.rs:234:9: replace <impl HttpRequest for RPCMempoolQueryRequestHandler>::try_parse_request -> Result<HttpRequestContents, Error> with Ok(Default::default()) in 4.1s build
unviable   stackslib/src/net/download.rs:2356:9: replace PeerNetwork::download_blocks -> Result<(bool, bool, Option<PoxId>, Vec<(ConsensusHash, StacksBlock, u64)>, Vec<(ConsensusHash, Vec<StacksMicroblock>, u64)>, Vec<usize>, Vec<NeighborKey>,), net_error, > with Ok((true, true, None, vec![], vec![(Default::default(), vec![], 0)], vec![], vec![])) in 4.3s build
caught     stackslib/src/net/p2p.rs:2657:9: replace PeerNetwork::public_ip_reset with () in 43.8s build + 895.1s test
caught     stackslib/src/net/relay.rs:1509:21: replace && with != in Relayer::filter_problematic_transactions in 44.9s build + 1004.7s test
caught     stackslib/src/clarity_cli.rs:798:24: replace < with > in consume_arg in 15.4s build + 1060.8s test
unviable   stackslib/src/clarity_vm/database/mod.rs:665:9: replace <impl ClarityBackingStore for MemoryBackingStore>::set_block_hash -> InterpreterResult<StacksBlockId> with InterpreterResult::from(Default::default()) in 4.5s build
test       stackslib/src/burnchains/burnchain.rs:233:17: replace && with != in BurnchainStateTransition::from_block_ops ... 206.0s
└          test chainstate::stacks::index::cache::test::test_marf_node_cache_node256_deferred_15500 ...
◤ 8/11434 mutants tested, 0% done, 3 caught, 5 unviable, 76:48 elapsed, about 79652 min remaining

@wileyj
Copy link

wileyj commented Dec 16, 2023

For what it's worth I did get it running on that tree and it found some missed mutants. It is certainly projecting to take a long time. Your tree seems to need --test-threads 1 to pass on my mac, which is probably not helping performance.

; ~/src/mutants/target/release/cargo-mutants mutants -vV -d ~/src/stacks-core/stackslib/ -- -- --test-threads 1
Found 11434 mutants to test
ok         Unmutated baseline in 213.9s build + 1048.0s test
Auto-set test timeout to 5239.8s
unviable   stackslib/src/burnchains/bitcoin/spv.rs:228:9: replace SpvClient::tx_begin -> Result<DBTx<'a>, btc_error> with Ok(Default::default()) in 4.9s build
unviable   stackslib/src/net/relay.rs:1349:13: replace || with != in Relayer::process_mined_problematic_blocks in 4.0s build
unviable   stackslib/src/net/api/postmempoolquery.rs:234:9: replace <impl HttpRequest for RPCMempoolQueryRequestHandler>::try_parse_request -> Result<HttpRequestContents, Error> with Ok(Default::default()) in 4.1s build
unviable   stackslib/src/net/download.rs:2356:9: replace PeerNetwork::download_blocks -> Result<(bool, bool, Option<PoxId>, Vec<(ConsensusHash, StacksBlock, u64)>, Vec<(ConsensusHash, Vec<StacksMicroblock>, u64)>, Vec<usize>, Vec<NeighborKey>,), net_error, > with Ok((true, true, None, vec![], vec![(Default::default(), vec![], 0)], vec![], vec![])) in 4.3s build
caught     stackslib/src/net/p2p.rs:2657:9: replace PeerNetwork::public_ip_reset with () in 43.8s build + 895.1s test
caught     stackslib/src/net/relay.rs:1509:21: replace && with != in Relayer::filter_problematic_transactions in 44.9s build + 1004.7s test
caught     stackslib/src/clarity_cli.rs:798:24: replace < with > in consume_arg in 15.4s build + 1060.8s test
unviable   stackslib/src/clarity_vm/database/mod.rs:665:9: replace <impl ClarityBackingStore for MemoryBackingStore>::set_block_hash -> InterpreterResult<StacksBlockId> with InterpreterResult::from(Default::default()) in 4.5s build
test       stackslib/src/burnchains/burnchain.rs:233:17: replace && with != in BurnchainStateTransition::from_block_ops ... 206.0s
└          test chainstate::stacks::index::cache::test::test_marf_node_cache_node256_deferred_15500 ...
◤ 8/11434 mutants tested, 0% done, 3 caught, 5 unviable, 76:48 elapsed, about 79652 min remaining

i would say this issue can probably be closed #186 (comment) since documenting optimal threads/jobs is very dependent on the host OS. I'm not sure there could be an effective change to make it any more clear than the docs currently are.

@sourcefrog
Copy link
Owner

I think it can be clearer, actually: check out https://github.com/sourcefrog/cargo-mutants/pull/197/files?short_path=87dcc8f#diff-87dcc8f7e6732969e3a8b77e8e5ca6f99f6b88b834480dccc7bdb9c174c117dd

But, after that, let's take the broader discussion to a Github Discussion since it's not exactly a single issue.

@sourcefrog
Copy link
Owner

I improved this in #197; let's continue in the Q&A forum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants