debug result is ok, release get NaN output #23440

kassinvin · 2025-01-21T11:51:52Z

Describe the issue

Platform
windows 10

Visual Studio Version
17

ONNX Runtime Installation
download from released package

ONNX Runtime Version or Commit ID
1.15.1

ONNX Runtime API
C++

Architecture
X64

Execution Provider
GPU RTX 2080

Model
SAM Decoder

Result

Debug:

Release:

To reproduce

Code

void prepare_tensor(const std::vector<float> &inputValues, float*& blob)
{
	size_t nSize = inputValues.size();
	blob = new float[nSize];

	for (size_t i = 0; i < nSize; i++)
	{
		blob[i] = inputValues[i];
	}
}

env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "ONNX_SAM");
sessionOptions = Ort::SessionOptions();
sessionOptions.SetIntraOpNumThreads(1);
sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_CUDA(sessionOptions, 0));
sessionPoint = Ort::Session(env, L"H:\\LABEL_DECODER_POINT.onnx", sessionOptions);

const char *inputNamesSam[6]{ "image_embeddings", "point_coords",   "point_labels", "mask_input",   "has_mask_input", "orig_im_size" }, 
                  *outputNamesSam[3]{ "masks", "iou_predictions", "low_res_masks" },

std::tuple<int, int> newShape = GetPreProcessShape(OriImgSize.second, OriImgSize.first, long_side_length);

float new_w = static_cast<float>(std::get<1>(newShape));
float new_h = static_cast<float>(std::get<0>(newShape));

float ratio_x = (new_w / OriImgSize.first);
float ratio_y = (new_h / OriImgSize.second);

std::vector<float> inputPointValues, inputLabelValues;

bool bRect = false;
for (size_t i = 0; i < pts.size(); i++)
{
		float new_x = pts[i].m_nX * ratio_x;
		float new_y = pts[i].m_nY * ratio_y;
		if (pts[i].m_nType == 0)//negtive point
		{
			inputPointValues.push_back(new_x);
			inputPointValues.push_back(new_y);
			inputLabelValues.push_back(0);
		}
		else//postive point
		{
			inputPointValues.push_back(new_x);
			inputPointValues.push_back(new_y);
			inputLabelValues.push_back(1);
		}
}

if (!bRect)
{
		inputPointValues.push_back(0.0f);
		inputPointValues.push_back(0.0f);
		inputLabelValues.push_back(-1);
}

const int numPoints = inputLabelValues.size();
std::vector<int64_t> inputPointShape = { 1, numPoints, 2 }, pointLabelsShape = { 1, numPoints },
		maskInputShape = { 1, 1, 256, 256 }, hasMaskInputShape = { 1 },
		origImSizeShape = { 2 }, inputEmbeddingShape = { 1, 256, 64, 64 };

std::vector<Ort::Value> inputTensorsSam;

float* blob_x = nullptr;
float* blob_y = nullptr;

std::transform(vImageCache.begin(), vImageCache.end(), imageEmbeddingValue, [](const float x)
	{
		return x;
	});

inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
		memoryInfo, imageEmbeddingValue, 1048576,
		inputEmbeddingShape.data(), inputEmbeddingShape.size()));

prepare_tensor(inputPointValues, blob_x);
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(memoryInfo, blob_x,
		2 * numPoints, inputPointShape.data(),
		inputPointShape.size()));

prepare_tensor(inputLabelValues, blob_y);
inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(memoryInfo, blob_y,
		numPoints, pointLabelsShape.data(),
		pointLabelsShape.size()));
const size_t maskInputSize = 256 * 256;
float maskInputValues[maskInputSize],
		hasMaskValues[] = { 0 },
		orig_im_size_values[] = { (float)OriImgSize.second, (float)OriImgSize.first };

if (vLastMaskCache.size() == 0)
{
		memset(maskInputValues, 0, maskInputSize);
}
else
{
	hasMaskValues[0] = 1;
	std::transform(vLastMaskCache.begin(), vLastMaskCache.end(), maskInputValues, [](const float x)
	{
			return x;
	});
}

inputTensorsSam.push_back(
		Ort::Value::CreateTensor<float>(memoryInfo, maskInputValues, maskInputSize,
			maskInputShape.data(), maskInputShape.size()));

inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
		memoryInfo, hasMaskValues, 1, hasMaskInputShape.data(), hasMaskInputShape.size()));

inputTensorsSam.push_back(Ort::Value::CreateTensor<float>(
		memoryInfo, orig_im_size_values, 2, origImSizeShape.data(), origImSizeShape.size()));

Ort::RunOptions runOptionsSam;
std::vector<Ort::Value> outputTensorsSam;

outputTensorsSam = sessionPoint.Run(runOptionsSam, inputNamesSam, inputTensorsSam.data(),
			inputTensorsSam.size(), outputNamesSam, 3);

auto masks = outputTensorsSam[0].GetTensorMutableData<float>();
auto iou_predictions = outputTensorsSam[1].GetTensorMutableData<float>();
auto low_res_masks = outputTensorsSam[2].GetTensorMutableData<float>();

for (int ii = 0; ii < 10; ii++)
{
		LOGPRINTF(stderr, "mask output values %d : %f\n", ii, masks[ii]);
}

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.1

eKevinHoang · 2025-01-21T17:45:58Z

Please reduce the optimization level from -O3 to -O2 or -O1 and report the results.
If the issue persists, consider updating ONNX Runtime to version 1.20.1 to check for any differences.

kassinvin · 2025-01-22T01:44:56Z

Please reduce the optimization level from -O3 to -O2 or -O1 and report the results. If the issue persists, consider updating ONNX Runtime to version 1.20.1 to check for any differences.

test -Od -O1 -O2 -Ox, still get NaN.

kassinvin · 2025-01-22T03:29:09Z

Please reduce the optimization level from -O3 to -O2 or -O1 and report the results. If the issue persists, consider updating ONNX Runtime to version 1.20.1 to check for any differences.

I tested ONNX runtime vesion 1.18.1 and 1.20.1 with visual studio 2022 with -O2 optimization level，get the same result.
The version 1.20.1 release package uses CUDA 12, but since I haven't installed it, I tested using the CPU instead, the result is the same.

eKevinHoang · 2025-01-22T04:33:03Z

@kassinvin It seems the issue might be caused by the input or output data variables not being allocated sufficient memory or being cleared unintentionally.

If you could provide a complete test code, it would make it easier for everyone to assist you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debug result is ok, release get NaN output #23440

debug result is ok, release get NaN output #23440

kassinvin commented Jan 21, 2025 •

edited by snnn

Loading

eKevinHoang commented Jan 21, 2025

kassinvin commented Jan 22, 2025

kassinvin commented Jan 22, 2025

eKevinHoang commented Jan 22, 2025

debug result is ok, release get NaN output #23440

debug result is ok, release get NaN output #23440

Comments

kassinvin commented Jan 21, 2025 • edited by snnn Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

eKevinHoang commented Jan 21, 2025

kassinvin commented Jan 22, 2025

kassinvin commented Jan 22, 2025

eKevinHoang commented Jan 22, 2025

kassinvin commented Jan 21, 2025 •

edited by snnn

Loading