Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend Configuration IIa] Add dataset identification tools #569

Merged
merged 40 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c4bca8a
port over tool function for defaults
CodyCBakerPhD Sep 18, 2023
38a1fa3
modify iterator as well
CodyCBakerPhD Sep 18, 2023
a981068
factor out backend config stuff to other PR
CodyCBakerPhD Sep 18, 2023
966592c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 18, 2023
2df8c06
Merge branch 'new_backend_pydantic_backend_configuration_models' into…
CodyCBakerPhD Sep 18, 2023
2e7af84
Update CHANGELOG.md
CodyCBakerPhD Sep 18, 2023
85bc927
Update __init__.py
CodyCBakerPhD Sep 18, 2023
8307156
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 18, 2023
87bee35
Merge branch 'new_backend_pydantic_backend_configuration_models' into…
CodyCBakerPhD Sep 18, 2023
87d1116
Merge branch 'new_backend_pydantic_backend_configuration_models' into…
CodyCBakerPhD Oct 4, 2023
13c9b37
use dataset_name in DatasetInfo; other debugs
Oct 4, 2023
b63161a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 4, 2023
d55e2a2
remove comments
Oct 5, 2023
4b848b0
Merge branch 'new_backend_default_dataset_configuration' of https://g…
Oct 5, 2023
446d81d
fix conflict
CodyCBakerPhD Nov 7, 2023
3c7cde8
remove unused typing
CodyCBakerPhD Nov 8, 2023
b845ac6
improve error message and fix import test
CodyCBakerPhD Nov 8, 2023
5c7fb6b
add global static maps; further condense tests with parametrize
CodyCBakerPhD Nov 8, 2023
91aab8c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 8, 2023
65eee6b
fix name
CodyCBakerPhD Nov 8, 2023
a0a4d07
Merge branch 'new_backend_default_dataset_configuration' of https://g…
CodyCBakerPhD Nov 8, 2023
673e2f9
Apply suggestions from code review
CodyCBakerPhD Nov 20, 2023
271660c
Merge branch 'main' into new_backend_default_dataset_configuration
CodyCBakerPhD Nov 20, 2023
8630316
PR suggestions
CodyCBakerPhD Nov 21, 2023
f7e1be6
Update src/neuroconv/tools/nwb_helpers/_dataset_configuration.py
CodyCBakerPhD Nov 21, 2023
46b8cdb
Merge branch 'main' into new_backend_default_dataset_configuration
CodyCBakerPhD Nov 21, 2023
6f0806a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 21, 2023
b07a541
add IO to dataset config names
CodyCBakerPhD Nov 21, 2023
2bb867a
fix conflict
CodyCBakerPhD Nov 21, 2023
89915ab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 21, 2023
bfe1049
fix minimal test
CodyCBakerPhD Nov 21, 2023
268e7e9
Merge branch 'new_backend_default_dataset_configuration' of https://g…
CodyCBakerPhD Nov 21, 2023
f1683fa
alter private method name
CodyCBakerPhD Nov 21, 2023
185a69d
add extra tests
CodyCBakerPhD Nov 21, 2023
6fad003
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 21, 2023
15d8aed
Merge branch 'main' into new_backend_default_dataset_configuration
h-mayorquin Nov 22, 2023
4f668ec
Merge branch 'main' into new_backend_default_dataset_configuration
CodyCBakerPhD Nov 22, 2023
5ce5914
add test for ragged tables; debug
CodyCBakerPhD Nov 22, 2023
8350d2e
Merge branch 'new_backend_default_dataset_configuration' of https://g…
CodyCBakerPhD Nov 22, 2023
3032755
adjust for cross-platform
CodyCBakerPhD Nov 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions src/neuroconv/tools/nwb_helpers/_dataset_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,16 +177,15 @@ def get_default_dataset_io_configurations(
if isinstance(neurodata_object, DynamicTable):
dynamic_table = neurodata_object # for readability

for column_name in dynamic_table.colnames:
candidate_dataset = dynamic_table[column_name].data # VectorData object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, so how are these two guys differently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, dynamic_table[column_name] calls __get_item__ on dynamic_table with key column_name, which default dict behavior would return the value, but they adjust it to return downstream links so that you can do things like units["spike_times"][:] and it returns the actual list of spikes shaped by units x spikes_per_unit

for column in dynamic_table.columns:
column_name = column.name
candidate_dataset = column.data # VectorData object
if _is_dataset_written_to_file(
candidate_dataset=candidate_dataset, backend=backend, existing_file=existing_file
):
continue # skip

yield _get_dataset_metadata(
neurodata_object=dynamic_table[column_name], field_name="data", backend=backend
)
yield _get_dataset_metadata(neurodata_object=column, field_name="data", backend=backend)
else:
# Primarily for TimeSeries, but also any extended class that has 'data' or 'timestamps'
# The most common example of this is ndx-events Events/LabeledEvents types
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from pynwb.base import DynamicTable
from pynwb.behavior import CompassDirection
from pynwb.image import ImageSeries
from pynwb.misc import Units
from pynwb.testing.mock.base import mock_TimeSeries
from pynwb.testing.mock.behavior import mock_SpatialSeries
from pynwb.testing.mock.file import mock_NWBFile
Expand Down Expand Up @@ -93,6 +94,111 @@ def test_configuration_on_dynamic_table(iterator: callable, backend: Literal["hd
assert dataset_configuration.filter_options is None


@pytest.mark.parametrize("backend", ["hdf5", "zarr"])
def test_configuration_on_ragged_units_table(backend: Literal["hdf5", "zarr"]):
nwbfile = mock_NWBFile()
units = Units(name="units", description="")

spike_times = np.array([0.0, 1.0, 2.0])
waveforms = np.array([[[1, 2, 3], [1, 2, 3], [1, 2, 3]], [[1, 2, 3], [1, 2, 3], [1, 2, 3]]])
units.add_unit(spike_times=spike_times, waveforms=waveforms)

spike_times = np.array([3.0, 4.0])
waveforms = np.array([[[4, 5], [4, 5], [4, 5]], [[4, 5], [4, 5], [4, 5]]])
units.add_unit(spike_times=spike_times, waveforms=waveforms)

nwbfile.units = units

dataset_configurations = list(get_default_dataset_io_configurations(nwbfile=nwbfile, backend=backend))

assert len(dataset_configurations) == 5

dataset_configuration = next(
dataset_configuration
for dataset_configuration in dataset_configurations
if dataset_configuration.dataset_info.location == "units/spike_times/data"
)
assert isinstance(dataset_configuration, DATASET_IO_CONFIGURATIONS[backend])
assert dataset_configuration.dataset_info.full_shape == (5,)
assert dataset_configuration.dataset_info.dtype == np.dtype("float64")
assert dataset_configuration.chunk_shape == (5,)
assert dataset_configuration.buffer_shape == (5,)
assert dataset_configuration.compression_method == "gzip"
assert dataset_configuration.compression_options is None

if backend == "zarr":
assert dataset_configuration.filter_methods is None
assert dataset_configuration.filter_options is None

dataset_configuration = next(
dataset_configuration
for dataset_configuration in dataset_configurations
if dataset_configuration.dataset_info.location == "units/spike_times_index/data"
)
assert isinstance(dataset_configuration, DATASET_IO_CONFIGURATIONS[backend])
assert dataset_configuration.dataset_info.full_shape == (2,)
assert dataset_configuration.dataset_info.dtype == np.dtype("uint8")
assert dataset_configuration.chunk_shape == (2,)
assert dataset_configuration.buffer_shape == (2,)
assert dataset_configuration.compression_method == "gzip"
assert dataset_configuration.compression_options is None

if backend == "zarr":
assert dataset_configuration.filter_methods is None
assert dataset_configuration.filter_options is None

dataset_configuration = next(
dataset_configuration
for dataset_configuration in dataset_configurations
if dataset_configuration.dataset_info.location == "units/waveforms/data"
)
assert isinstance(dataset_configuration, DATASET_IO_CONFIGURATIONS[backend])
assert dataset_configuration.dataset_info.full_shape == (12, 3)
assert dataset_configuration.dataset_info.dtype == np.dtype("int32")
assert dataset_configuration.chunk_shape == (12, 3)
assert dataset_configuration.buffer_shape == (12, 3)
assert dataset_configuration.compression_method == "gzip"
assert dataset_configuration.compression_options is None

if backend == "zarr":
assert dataset_configuration.filter_methods is None
assert dataset_configuration.filter_options is None

dataset_configuration = next(
dataset_configuration
for dataset_configuration in dataset_configurations
if dataset_configuration.dataset_info.location == "units/waveforms_index/data"
)
assert isinstance(dataset_configuration, DATASET_IO_CONFIGURATIONS[backend])
assert dataset_configuration.dataset_info.full_shape == (4,)
assert dataset_configuration.dataset_info.dtype == np.dtype("uint8")
assert dataset_configuration.chunk_shape == (4,)
assert dataset_configuration.buffer_shape == (4,)
assert dataset_configuration.compression_method == "gzip"
assert dataset_configuration.compression_options is None

if backend == "zarr":
assert dataset_configuration.filter_methods is None
assert dataset_configuration.filter_options is None

dataset_configuration = next(
dataset_configuration
for dataset_configuration in dataset_configurations
if dataset_configuration.dataset_info.location == "units/waveforms_index_index/data"
)
assert isinstance(dataset_configuration, DATASET_IO_CONFIGURATIONS[backend])
assert dataset_configuration.dataset_info.full_shape == (2,)
assert dataset_configuration.dataset_info.dtype == np.dtype("uint8")
assert dataset_configuration.chunk_shape == (2,)
assert dataset_configuration.buffer_shape == (2,)
assert dataset_configuration.compression_method == "gzip"
assert dataset_configuration.compression_options is None

if backend == "zarr":
assert dataset_configuration.filter_methods is None
assert dataset_configuration.filter_options is None


@pytest.mark.parametrize("iterator", [lambda x: x, SliceableDataChunkIterator, DataChunkIterator])
@pytest.mark.parametrize("backend", ["hdf5", "zarr"])
def test_configuration_on_compass_direction(iterator: callable, backend: Literal["hdf5", "zarr"]):
Expand Down