Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape inference failure due to the move pad preseg pass #3597

Closed
naoyam opened this issue Dec 16, 2024 · 0 comments · Fixed by #3598
Closed

Shape inference failure due to the move pad preseg pass #3597

naoyam opened this issue Dec 16, 2024 · 0 comments · Fixed by #3598

Comments

@naoyam
Copy link
Collaborator

naoyam commented Dec 16, 2024

Encountered while testing #3556

Repro:

TEST_F(MovePadTest, Issue) {
  auto fusion_ptr = std::make_unique<Fusion>();
  Fusion& fusion = *fusion_ptr;
  FusionGuard fg(fusion_ptr.get());

  auto tv0 = makeSymbolicTensor(2);
  fusion.addInput(tv0);
  auto tv1 = makeSymbolicTensor(2);
  fusion.addInput(tv1);

  auto tv2 = slice(tv0, {{fusion.oneVal(), tv0->axis(0)->extent()},
                         {fusion.zeroVal(), tv0->axis(1)->extent()}});
  auto tv3 = segment_set(tv2);

  auto tv4 = add(tv3, tv1);
  auto tv5 = pad(tv4, {fusion.zeroVal(), fusion.oneVal()});
  auto tv6 = set(tv5);
  fusion.addOutput(tv6);

  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  auto t0 = at::randn({5, 10}, options);
  auto t1 = at::randn({4, 10}, options);
  std::vector<c10::IValue> inputs({t0, t1});

  FusionExecutorCache executor_cache(std::move(fusion_ptr));
  auto outputs = executor_cache.runFusionWithInputs(inputs);

  testValidate(
      executor_cache.fusion(), outputs, inputs, __LINE__, __FILE__);
}
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from MovePadTest
[ RUN      ] MovePadTest.TMP
unknown file: Failure
C++ exception with description " INTERNAL ASSERT FAILED at "/home/nmaruyama/nvfuser/debug3/csrc/runtime/allocations.cpp":168, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Could not launch kernel as program could not infer i3(i3) for the buffer T6_g_float[iS14{i3}, iS19{( i2 + 1 )}]
Exception raised from inferShape at /home/nmaruyama/nvfuser/debug3/csrc/runtime/allocations.cpp:168 (most recent call first):

Here's the fusion after the preseg pass:

Fusion IR after pre-segmenter optimization passes:
Inputs:
  T0_g_float[iS0{i0}, iS1{i2}]
  T1_g_float[iS2{i3}, iS30{i2}]
Outputs:
  T6_g_float[iS14{i3}, iS19{( i2 + 1 )}]

%kernel {
T2_l_float[iS16{( i0 - 1 )}rf, iS6{i2}]
   = slice( T0_g_float[iS0{i0}, iS1{i2}], { {1, i0, 1} {0, i2, 1} } )
T3_l_float[iS18{( i0 - 1 )}, iS8{i2}]
   = SegmenterSet( T2_l_float[iS16{( i0 - 1 )}rf, iS6{i2}] )
T7_l_float[iS20{( i0 - 1 )}, iS22{( i2 + 1 )}rf]
   = pad( T3_l_float[iS18{( i0 - 1 )}, iS8{i2}], {0, 0, 0, 1} )
T8_l_float[iS23{i3}, iS25{( i2 + 1 )}rf]
   = pad( T1_g_float[iS2{i3}, iS30{i2}], {0, 0, 0, 1} )
T9_l_float[iS26{( i0 - 1 )}, iS27{( i2 + 1 )}]
   = T7_l_float[iS20{( i0 - 1 )}, iS22{( i2 + 1 )}rf]
   + T8_l_float[iS23{i3}, iS25{( i2 + 1 )}rf];
T10_l_float[iS28{( i0 - 1 )}, iS29{( i2 + 1 )}]
   = SegmenterSet( T9_l_float[iS26{( i0 - 1 )}, iS27{( i2 + 1 )}] )
T6_g_float[iS14{i3}, iS19{( i2 + 1 )}]
   = Set( T10_l_float[iS28{( i0 - 1 )}, iS29{( i2 + 1 )}], cache_op=Streaming )

As you can see, only T6 has i3, which causes ExprEvaluator to fail to infer its size.

@naoyam naoyam mentioned this issue Dec 16, 2024
naoyam added a commit that referenced this issue Dec 17, 2024
Fixes #3597 

Also used more descriptive variable names to help read the code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant