Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setitem with repeated indices is not guaranteed in practice #2855

Open
dcherian opened this issue Feb 21, 2025 · 3 comments
Open

setitem with repeated indices is not guaranteed in practice #2855

dcherian opened this issue Feb 21, 2025 · 3 comments
Labels
bug Potential issues with the zarr-python library

Comments

@dcherian
Copy link
Contributor

dcherian commented Feb 21, 2025

Zarr version

v2.18.3, 3.0.2 and 3.0.3

Numcodecs version

?

Python Version

?

Operating System

linux & windows

Installation

either

Description

behaviour of setitem with repeated indices depends on platform. The example below succeeds on Mac and fails on ubuntu, windows with numpy 2.2. It succeeds in all CI environments with numpy 1.25.

I can reproduce on zarr v2.18.4, v3.0.2 on linux.

I say we disallow this

Steps to reproduce

import zarr
import numpy as np

array = zarr.array(data=np.zeros((4,)), chunks=(1,))
indexer = np.array([-1, -1, 0, 0])
array.oindex[(indexer,)] = [0, 1, 2, 3]
np.testing.assert_array_equal(array[:], np.array([3, 0, 0, 1]))

fails on windows & linux, numpy=2.2 in CI.

cc @LDeakin @ilan-gold

@d-v-b
Copy link
Contributor

d-v-b commented Feb 24, 2025

is numpy's behavior also platform-dependent? If not, then I think we should copy numpy here. On my OSX machine, numpy uses the last value assigned:

>>> x = np.arange(4)
>>> x[np.ones(10, dtype='int')] = np.arange(10)
>>> x
array([0, 9, 2, 3]) # value at index 1 is set to the last value of np.arange(10)

We probably need to transform the input such that it defines an indexing operation that only applies 1 change to each output index.

@LDeakin
Copy link
Contributor

LDeakin commented Feb 24, 2025

We probably need to transform the input such that it defines an indexing operation that only applies 1 change to each output index.

That'd be perfect

@dcherian
Copy link
Contributor Author

numpy behaviour is always consistent and pretty well-defined since it's serial.

I don't understand what's going on here. transforming the input does seem sensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants