Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ME library] CPU mode bug #91

Open
Temigo opened this issue Feb 9, 2022 · 0 comments
Open

[ME library] CPU mode bug #91

Temigo opened this issue Feb 9, 2022 · 0 comments

Comments

@Temigo
Copy link
Member

Temigo commented Feb 9, 2022

The full chain ME code is currently unreliable in CPU mode. We rely on the coordinate ordering to perform many operations, e.g. ghost masking or semantic segmentation masking. The crucial assumption that the coordinate ordering is conserved throughout the full chain operations is verified on GPU, but breaks down on CPU.

Specifically, it starts breaking down in PPN (specifically models/layers/common/ppnplus.py in the class AttentionMask) and strongly suspected in GraphSpice as well.

How to reproduce the "bug"

This is the shortest minimal example that I could come up with.

import numpy as np
import MinkowskiEngine as ME
import torch

# Parameters
N = 10
device = 'cuda:0' # change this to 'cpu' to see the difference

# Create x
feats = torch.rand(N, 1).to(device)
coords = torch.cat([torch.zeros((N, 1)), torch.rand(N, 3) * 100], dim=1).to(device)
x = ME.SparseTensor(features=feats, coordinates=coords )

# Create mask
mask = (torch.rand(N, 6) > 0.5).float().to(device)
mask = ME.SparseTensor(
    coordinates=x.C,
    features=mask,
    coordinate_manager=x.coordinate_manager,
    tensor_stride=x.tensor_stride,
)

# Create x0
x0 = ME.SparseTensor(
    coordinates=x.C,
    features=torch.zeros(x.F.shape[0], mask.F.shape[1]).to(device),
    coordinate_manager=x.coordinate_manager,
    tensor_stride=x.tensor_stride
)

Now you can compare the coordinate tensors obtained through the .C attribute and the order will change after the addition x0+mask :

print(x.C, mask.C, x0.C ) # These are all identical
# No a priori reason but this set of coordinates is ordered differently on CPU, and identical to the previous one on GPU
print((mask + x0).C)

What does MinkowskiEngine say?

Well, they do not guarantee the coordinate ordering. See
https://github.com/NVIDIA/MinkowskiEngine/blob/master/MinkowskiEngine/MinkowskiTensor.py#L291

The order of coordinates is non-deterministic within each batch.
Use :attr:decomposed_coordinates_and_features to retrieve
both coordinates features with the same order. To retrieve the
order the decomposed coordinates is generated, use :attr:decomposition_permutations.

(I have to say, it is not 100% clear to me what decomposition_permutations is for. But it definitely does not allow to retrieve the original coordinate ordering. (still would be cumbersome to have to correct every now and then in the code))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant