You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given the exact same code, I observed that the coordinates ordering from some_tensor.C can change depending on the device (CPU vs GPU). Is this expected?
To Reproduce
This script is the smallest minimal example I could come up with:
importnumpyasnpimportMinkowskiEngineasMEimporttorchdefreproduce(N, device):
# Create xfeats=torch.rand(N, 1).to(device)
coords=torch.cat([torch.zeros((N, 1)), torch.rand(N, 3) *100], dim=1).to(device)
x=ME.SparseTensor(features=feats, coordinates=coords )
# Create maskmask= (torch.rand(N, 6) >0.5).float().to(device)
mask=ME.SparseTensor(
coordinates=x.C,
features=mask,
coordinate_manager=x.coordinate_manager,
tensor_stride=x.tensor_stride,
)
# Create x0x0=ME.SparseTensor(
coordinates=x.C,
features=torch.zeros(x.F.shape[0], mask.F.shape[1]).to(device),
coordinate_manager=x.coordinate_manager,
tensor_stride=x.tensor_stride
)
# print(x.C, mask.C, x0.C ) # These are all identicalprint('Do x, mask and x0 have all the same coordinates ordering? ', (x0.C==x.C).all() and (x0.C==mask.C).all())
# No a priori reason but this set of coordinates is ordered differently on CPU, and identical to the previous one on GPU# print((mask + x0).C)print('Do mask + x0 and x0 have the same coordinates ordering? ', ((mask+x0).C==x0.C).all())
if__name__=='__main__':
print('Testing on CPU')
reproduce(10, 'cpu')
print('Testing on GPU')
reproduce(10, 'cuda:0')
The output that I get is :
Testing on CPU
Do x, mask and x0 have all the same coordinates ordering? tensor(True)
Do mask + x0 and x0 have the same coordinates ordering? tensor(False)
Testing on GPU
Do x, mask and x0 have all the same coordinates ordering? tensor(True, device='cuda:0')
Do mask + x0 and x0 have the same coordinates ordering? tensor(True, device='cuda:0')
Expected behavior
If the coordinate ordering is going to change after a certain operation, I would expect the change to be consistent between CPU/GPU.
On GPU I never ever see the coordinate ordering change which is what I was initially expecting. This comment suggests that this behavior is however not guaranteed?
Desktop
==========System==========
Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-glibc2.29
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]
==========Pytorch==========
1.9.0+cu111
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 470.82.01
CUDA Version 11.4
VBIOS Version 90.02.30.40.85
Image Version G001.0000.02.04
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010
Additional context
We heavily rely on the coordinate ordering of some_tensor.C for various operations such as masking. This "bug" (feature?) currently prevents our code from working on CPU. This was not an issue in the past, but I have not pinpointed if a specific version of ME started this behavior.
Describe the bug
Given the exact same code, I observed that the coordinates ordering from
some_tensor.C
can change depending on the device (CPU vs GPU). Is this expected?To Reproduce
This script is the smallest minimal example I could come up with:
The output that I get is :
Expected behavior
Desktop
==========System==========
Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-glibc2.29
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]
==========Pytorch==========
1.9.0+cu111
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 470.82.01
CUDA Version 11.4
VBIOS Version 90.02.30.40.85
Image Version G001.0000.02.04
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010
Additional context
We heavily rely on the coordinate ordering of
some_tensor.C
for various operations such as masking. This "bug" (feature?) currently prevents our code from working on CPU. This was not an issue in the past, but I have not pinpointed if a specific version of ME started this behavior.Referencing our code's original issue here.
Thank you!!
The text was updated successfully, but these errors were encountered: