Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug result is ok, release get NaN output #23440

Open
kassinvin opened this issue Jan 21, 2025 · 4 comments
Open

debug result is ok, release get NaN output #23440

kassinvin opened this issue Jan 21, 2025 · 4 comments

Comments

@kassinvin
Copy link

kassinvin commented Jan 21, 2025

Describe the issue

Platform
windows 10

Visual Studio Version
17

ONNX Runtime Installation
download from released package

Image

ONNX Runtime Version or Commit ID
1.15.1

ONNX Runtime API
C++

Architecture
X64

Execution Provider
GPU RTX 2080

Model
SAM Decoder

Result

Debug:
Image

Release:

Image

To reproduce

Code

void prepare_tensor(const std::vector<float> &inputValues, float*& blob)
{
	size_t nSize = inputValues.size();
	blob = new float[nSize];

	for (size_t i = 0; i < nSize; i++)
	{
		blob[i] = inputValues[i];
	}
}

env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "ONNX_SAM");
sessionOptions = Ort::SessionOptions();
sessionOptions.SetIntraOpNumThreads(1);
sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_CUDA(sessionOptions, 0));
sessionPoint = Ort::Session(env, L"H:\\LABEL_DECODER_POINT.onnx", sessionOptions);

const char *inputNamesSam[6]{ "image_embeddings", "point_coords",   "point_labels", "mask_input",   "has_mask_input", "orig_im_size" }, 
                  *outputNamesSam[3]{ "masks", "iou_predictions", "low_res_masks" },

std::tuple<int, int> newShape = GetPreProcessShape(OriImgSize.second, OriImgSize.first, long_side_length);

float new_w = static_cast<float>(std::get<1>(newShape));
float new_h = static_cast<float>(std::get<0>(newShape));

float ratio_x = (new_w / OriImgSize.first);
float ratio_y = (new_h / OriImgSize.second);

std::vector<float> inputPointValues, inputLabelValues;

bool bRect = false;
for (size_t i = 0; i < pts.size(); i++)
{
		float new_x = pts[i].m_nX * ratio_x;
		float new_y = pts[i].m_nY * ratio_y;
		if (pts[i].m_nType == 0)//negtive point
		{
			inputPointValues.push_back(new_x);
			inputPointValues.push_back(new_y);
			inputLabelValues.push_back(0);
		}
		else//postive point
		{
			inputPointValues.push_back(new_x);
			inputPointValues.push_back(new_y);
			inputLabelValues.push_back(1);
		}
}

if (!bRect)
{
		inputPointValues.push_back(0.0f);
		inputPointValues.push_back(0.0f);
		inputLabelValues.push_back(-1);
}

const int numPoints = inputLabelValues.size();
std::vector<int64_t> inputPointShape = { 1, numPoints, 2 }, pointLabelsShape = { 1, numPoints },
		maskInputShape = { 1, 1, 256, 256 }, hasMaskInputShape = { 1 },
		origImSizeShape = { 2 }, inputEmbeddingShape = { 1, 256, 64, 64 };

std::vector<Ort::Value> inputTensorsSam;

float* blob_x = nullptr;
float* blob_y = nullptr;

std::transform(vImageCache.begin(), vImageCache.end(), imageEmbeddingValue, [](const float x)
	{
		return x;
	});

inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
		memoryInfo, imageEmbeddingValue, 1048576,
		inputEmbeddingShape.data(), inputEmbeddingShape.size()));

prepare_tensor(inputPointValues, blob_x);
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(memoryInfo, blob_x,
		2 * numPoints, inputPointShape.data(),
		inputPointShape.size()));

prepare_tensor(inputLabelValues, blob_y);
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(memoryInfo, blob_y,
		numPoints, pointLabelsShape.data(),
		pointLabelsShape.size()));
const size_t maskInputSize = 256 * 256;
float maskInputValues[maskInputSize],
		hasMaskValues[] = { 0 },
		orig_im_size_values[] = { (float)OriImgSize.second, (float)OriImgSize.first };

if (vLastMaskCache.size() == 0)
{
		memset(maskInputValues, 0, maskInputSize);
}
else
{
	hasMaskValues[0] = 1;
	std::transform(vLastMaskCache.begin(), vLastMaskCache.end(), maskInputValues, [](const float x)
	{
			return x;
	});
}

inputTensorsSam.push_back(
		Ort::Value::CreateTensor<float>(memoryInfo, maskInputValues, maskInputSize,
			maskInputShape.data(), maskInputShape.size()));

inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
		memoryInfo, hasMaskValues, 1, hasMaskInputShape.data(), hasMaskInputShape.size()));

inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
		memoryInfo, orig_im_size_values, 2, origImSizeShape.data(), origImSizeShape.size()));

Ort::RunOptions runOptionsSam;
std::vector<Ort::Value> outputTensorsSam;

outputTensorsSam = sessionPoint.Run(runOptionsSam, inputNamesSam, inputTensorsSam.data(),
			inputTensorsSam.size(), outputNamesSam, 3);

auto masks = outputTensorsSam[0].GetTensorMutableData<float>();
auto iou_predictions = outputTensorsSam[1].GetTensorMutableData<float>();
auto low_res_masks = outputTensorsSam[2].GetTensorMutableData<float>();

for (int ii = 0; ii < 10; ii++)
{
		LOGPRINTF(stderr, "mask output values %d : %f\n", ii, masks[ii]);
}

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.1

@eKevinHoang
Copy link

Please reduce the optimization level from -O3 to -O2 or -O1 and report the results.
If the issue persists, consider updating ONNX Runtime to version 1.20.1 to check for any differences.

@kassinvin
Copy link
Author

Please reduce the optimization level from -O3 to -O2 or -O1 and report the results. If the issue persists, consider updating ONNX Runtime to version 1.20.1 to check for any differences.

test -Od -O1 -O2 -Ox, still get NaN.

@kassinvin
Copy link
Author

Please reduce the optimization level from -O3 to -O2 or -O1 and report the results. If the issue persists, consider updating ONNX Runtime to version 1.20.1 to check for any differences.

I tested ONNX runtime vesion 1.18.1 and 1.20.1 with visual studio 2022 with -O2 optimization level,get the same result.
The version 1.20.1 release package uses CUDA 12, but since I haven't installed it, I tested using the CPU instead, the result is the same.

@eKevinHoang
Copy link

@kassinvin It seems the issue might be caused by the input or output data variables not being allocated sufficient memory or being cleared unintentionally.

If you could provide a complete test code, it would make it easier for everyone to assist you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants