-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#0: Improve error messages in Tensor storage classes #18321
base: main
Are you sure you want to change the base?
Conversation
ttnn/cpp/ttnn/tensor/storage.hpp
Outdated
@@ -200,7 +200,7 @@ struct MultiDeviceHostStorage { | |||
|
|||
TensorSpec get_tensor_spec(int spec_index) const { | |||
std::lock_guard<std::mutex> lock(mtx); | |||
TT_ASSERT(spec_index < specs.size(), "Buffer not found for device {}", spec_index); | |||
TT_ASSERT(spec_index < specs.size(), "Spec for device {} not found in spec list, were buffer and spec added?", spec_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be TT_FATAL
. I'd also argue "were buffer and spec added?" is more confusing for the end user to see than just a failing assertion.
Raises more questions unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming by exception Denys means a TT_THROW, I personally think a FATAL would be more clear. Both would be preferable to the existing assert though.
f87408a
to
3b83b09
Compare
@@ -200,7 +200,7 @@ struct MultiDeviceHostStorage { | |||
|
|||
TensorSpec get_tensor_spec(int spec_index) const { | |||
std::lock_guard<std::mutex> lock(mtx); | |||
TT_ASSERT(spec_index < specs.size(), "Buffer not found for device {}", spec_index); | |||
TT_FATAL(spec_index < specs.size(), "Spec for device {} not found in spec list", spec_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can make the same change in other places in the file, for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
3b83b09
to
58e542a
Compare
ttnn/cpp/ttnn/tensor/storage.hpp
Outdated
@@ -325,7 +325,7 @@ struct MultiDeviceStorage { | |||
|
|||
inline std::shared_ptr<Buffer> get_buffer_for_device(IDevice* device) const { | |||
std::lock_guard<std::mutex> lock(buffer_mtx); | |||
TT_ASSERT(buffers.find(device->id()) != buffers.end(), "Buffer not found for device {}", device->id()); | |||
TT_FATAL(buffers.find(device->id()) != buffers.end(), "Buffer not found for device {}", device->id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one and below, you are performing redundant lookups in an unordered map. This involves hashing the key, and potentially iterating over a linked list of keys that hashes to the same value.
To avoid this, you can do:
auto buffer_it = buffers.find(device->id();
TT_FATAL(buffer_it != buffers.end(), ...)
// Perhaps we can make this TT_FATAL as well, but let's leave for now.
TT_ASSERT(...)
return buffer_it->second;
Also |
0b7f743
to
8a9e4b1
Compare
ttnn/cpp/ttnn/tensor/storage.hpp
Outdated
} | ||
|
||
inline std::shared_ptr<Buffer>& get_buffer_for_device(IDevice* device) { | ||
std::lock_guard<std::mutex> lock(buffer_mtx); | ||
TT_ASSERT(buffers.find(device->id()) != buffers.end(), "Buffer not found for device {}", device->id()); | ||
TT_FATAL(buffers.find(device->id()) != buffers.end(), "Buffer not found for device {}", device->id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the same here and below in get_tensor_spec_for_device
ttnn/cpp/ttnn/tensor/storage.hpp
Outdated
TT_ASSERT( | ||
buffers.at(device->id())->device() == device, | ||
buffer_it->device() == device, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer_it->second->device()
I believe? TT_ASSERT
is only compiled in debug mode, try to build with --debug
flag.
cf9d42e
to
4c9ed77
Compare
8eea513
to
81d6f70
Compare
81d6f70
to
4eeb969
Compare
Ticket
Link to Github Issue: N/A
Problem description
An error was a little unclear and potentially misleading and some breaking errors were marked as assert rather than fatal.
What's changed
Reworded and changed a spec assert to a TT_FATAL making the problem more clear for users who run into this error in the future.
Changed some buffer checks to FATAL.
Checklist