-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix std::future status query #596
base: branch-25.04
Are you sure you want to change the base?
Fix std::future status query #596
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
1 similar comment
/ok to test |
/ok to test |
Discussing further in #593 rather than here. |
aff60d5
to
caa1e6a
Compare
/ok to test |
/ok to test |
1 similar comment
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @kingcrimsontianyu. The only question is if the overhead of creating the extra gather task is an issue. I suspect not, but it would be good to get it confirmed.
cc. @GregoryKimball, @vuule.
/** | ||
* @brief Submit the move-only task callable to the underlying thread pool. | ||
* | ||
* @tparam F Callable type. F shall be move-only and have no argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is tempting to add the "can't copy" part to the code to improve type safety:
template <typename F, std::enable_if_t<!std::is_copy_constructible_v<F>, bool> = true>
std::future<std::size_t> submit_move_only_task(F op_move_only)
But this does not work.
For the non-copyable gather_tasks
below, trying to copy construct results in compile error, but the type trait gives unexpected result.
auto gather_tasks_copy = gather_tasks; // Compile error as expected
static_assert(std::is_copy_constructible_v<decltype(gather_tasks)>); // No compile error. Surprise.
A simpler example:
std::vector<std::future<int>> move_only;
static_assert(std::is_copy_constructible_v<decltype(move_only)>); // No compile error. Surprise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good writeup: https://quuxplusone.github.io/blog/2020/02/05/vector-is-copyable-except-when-its-not/
41afba6
to
94c266e
Compare
I think this looks good overall to me now. Is it ready for a thorough review? I'm not sure if it is intentionally still in draft or not. |
Thanks for the reminder. Other than pending performance assessment, this PR should be ready for review. |
I'll use cuDF bench for performance assessment, and post the results here when ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll leave it to you to verify performance before merging.
cpp/include/kvikio/utils.hpp
Outdated
@@ -152,6 +153,7 @@ std::tuple<void*, std::size_t, std::size_t> get_alloc_info(void const* devPtr, | |||
template <typename T> | |||
bool is_future_done(T const& future) | |||
{ | |||
assert(future.valid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a public API, do we want to throw if this condition is violated or are we happy with a debug assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think we do need to check the pre-condition to avoid UB. Done.
/** | ||
* @brief Submit the move-only task callable to the underlying thread pool. | ||
* | ||
* @tparam F Callable type. F shall be move-only and have no argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good writeup: https://quuxplusone.github.io/blog/2020/02/05/vector-is-copyable-except-when-its-not/
94c266e
to
b519cc2
Compare
b519cc2
to
1691bd0
Compare
Performance comparison using libcudf's benchmark. Setup
parquet_read_io_compression
orc_read_io_compression
json_read_io
csv_read_io
|
Rerun the orc_read_io_compression cold cache case with
|
Increase the thread number N by 1, and compare
|
Terminology:
|
48db87b
to
050899c
Compare
@kingcrimsontianyu, so the benchmark are saying that for large IO the new PR is good and for small IO the new PR is bad? Could this be an ordering issue? |
|
||
// 1) Submit `task_size` sized tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about (possibly simpler):
std::vector<std::future<std::size_t>> tasks;
tasks.reserve(size / task_size);
while (size > task_size) {
tasks.push_back(detail::submit_task(op, buf, task_size, file_offset, devPtr_offset));
file_offset += task_size;
devPtr_offset += task_size;
size -= task_size;
}
auto last_task = [=, tasks = std::move(tasks)]() mutable -> std::size_t {
auto ret = op(buf, size, file_offset, devPtr_offset);
for (auto& task: tasks) {
ret += task.get();
}
return ret;
};
return detail::submit_move_only_task(std::move(last_task));
}
But, I have a question, why not also submit the final read task and just gather all the results in the last_task
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR initially made that implementation as you described: we submit the sequence of N tasks, and instead of using std::async(std::launch::deferred)
to launch the gather task as the dev branch does, we submit the gather task to the task queue of the thread pool. However, the performance regression was observed, somehow only on the ORC benchmark in libcudf. But we have noticed that if we let the last task do the gather operation (for the N-1 tasks), the regression would disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: The suggested code is much more concise. Replaced as suggested. Thanks!
@madsbk We had a sync last week, and yes! this is what we suspected was happening to the ORC benchmark. I'll do some profiling this week to hopefully better understand its I/O pattern. |
e8455b0
to
359c19a
Compare
7671099
to
979ea1e
Compare
Pending performance regression check!!
This PR fixes the
std::future
status query according to the discussion in #593 .std::async
with the deferred launch policy. Doing so would cause the future status to be falsely reported as done. Instead, as suggested, they are created by the underlying thread pool (more specifically by the promise objects). This enables non-blocking future status query.std::async
with the deferred launch policy.