Skip to content

Commit

Permalink
Fix ann-bench dataset blob integer overflow leading to incorrect data…
Browse files Browse the repository at this point in the history
… copy beyond 4B elems (#671)

ann-bench keeps data dimensions as `uint32_t`. We use `std::fread` to copy the data from a file to the host memory and pass `n_rows * n_cols` there, which gets casted to size_t only after the multiplication. This leads to integer overflow for the datasets larger than 4B elements and a partial data copy.

This PR fixes the bug by casting the dimensions before the multiplication.
The bug only affects the benchmark cases where the data is requested in the host memory not backed by a file.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #671
  • Loading branch information
achirkin authored Feb 7, 2025
1 parent 4b289a0 commit f15c1ea
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion cpp/bench/ann/src/common/blob.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,8 @@ struct blob_mmap {
size_t size = data_end - data_start;
mmap_owner owner{size, flags};
std::fseek(file_.descriptor().value(), data_start, SEEK_SET);
size_t n_elems = file_.rows_limit() * file_.n_cols();
auto n_elems =
static_cast<size_t>(file_.rows_limit()) * static_cast<size_t>(file_.n_cols());
if (std::fread(owner.data(), sizeof(T), n_elems, file_.descriptor().value()) != n_elems) {
throw std::runtime_error{"cuvs::bench::blob_mmap() fread " + file_.path() + " failed"};
}
Expand Down

0 comments on commit f15c1ea

Please sign in to comment.