diff --git a/CHANGELOG.md b/CHANGELOG.md index 22c21b0e8..e72015601 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,7 @@ ### Enhancements - Added `TermSetConfigurator` to automatically wrap fields with `TermSetWrapper` according to a configuration file. @mavaylon1 [#1016](https://github.com/hdmf-dev/hdmf/pull/1016) +- Updated `TermSetWrapper` to support validating a single field within a compound array. @mavaylon1 [#1061](https://github.com/hdmf-dev/hdmf/pull/1061) ## HDMF 3.13.0 (March 20, 2024) @@ -138,8 +139,8 @@ will increase the minor version number to 3.10.0. See the 3.9.1 release notes be ## HDMF 3.6.0 (May 12, 2023) ### New features and minor improvements -- Updated `ExternalResources` to have `FileTable` and new methods to query data. the `ResourceTable` has been removed along with methods relating to `Resource`. @mavaylon [#850](https://github.com/hdmf-dev/hdmf/pull/850) -- Updated hdmf-common-schema version to 1.6.0. @mavaylon [#850](https://github.com/hdmf-dev/hdmf/pull/850) +- Updated `ExternalResources` to have `FileTable` and new methods to query data. the `ResourceTable` has been removed along with methods relating to `Resource`. @mavaylon1 [#850](https://github.com/hdmf-dev/hdmf/pull/850) +- Updated hdmf-common-schema version to 1.6.0. @mavaylon1 [#850](https://github.com/hdmf-dev/hdmf/pull/850) - Added testing of HDMF-Zarr on PR and nightly. @rly [#859](https://github.com/hdmf-dev/hdmf/pull/859) - Replaced `setup.py` with `pyproject.toml`. @rly [#844](https://github.com/hdmf-dev/hdmf/pull/844) - Use `ruff` instead of `flake8`. @rly [#844](https://github.com/hdmf-dev/hdmf/pull/844) @@ -153,7 +154,7 @@ will increase the minor version number to 3.10.0. See the 3.9.1 release notes be [#853](https://github.com/hdmf-dev/hdmf/pull/853) ### Documentation and tutorial enhancements: -- Updated `ExternalResources` how to tutorial to include the new features. @mavaylon [#850](https://github.com/hdmf-dev/hdmf/pull/850) +- Updated `ExternalResources` how to tutorial to include the new features. @mavaylon1 [#850](https://github.com/hdmf-dev/hdmf/pull/850) ## HDMF 3.5.6 (April 28, 2023) @@ -193,13 +194,13 @@ will increase the minor version number to 3.10.0. See the 3.9.1 release notes be ### Bug fixes - Fixed issue with conda CI. @rly [#823](https://github.com/hdmf-dev/hdmf/pull/823) -- Fixed issue with deprecated `pkg_resources`. @mavaylon [#822](https://github.com/hdmf-dev/hdmf/pull/822) -- Fixed `hdmf.common` deprecation warning. @mavaylon [#826]((https://github.com/hdmf-dev/hdmf/pull/826) +- Fixed issue with deprecated `pkg_resources`. @mavaylon1 [#822](https://github.com/hdmf-dev/hdmf/pull/822) +- Fixed `hdmf.common` deprecation warning. @mavaylon1 [#826]((https://github.com/hdmf-dev/hdmf/pull/826) ### Internal improvements - A number of typos fixed and Github action running codespell to ensure that no typo sneaks in [#825](https://github.com/hdmf-dev/hdmf/pull/825) was added. -- Added additional documentation for `__fields__` in `AbstactContainer`. @mavaylon [#827](https://github.com/hdmf-dev/hdmf/pull/827) -- Updated warning message for broken links. @mavaylon [#829](https://github.com/hdmf-dev/hdmf/pull/829) +- Added additional documentation for `__fields__` in `AbstactContainer`. @mavaylon1 [#827](https://github.com/hdmf-dev/hdmf/pull/827) +- Updated warning message for broken links. @mavaylon1 [#829](https://github.com/hdmf-dev/hdmf/pull/829) ## HDMF 3.5.1 (January 26, 2023) @@ -218,9 +219,9 @@ will increase the minor version number to 3.10.0. See the 3.9.1 release notes be - Added ``HDMFIO.__del__`` to ensure that I/O objects are being closed on delete. @oruebel[#811](https://github.com/hdmf-dev/hdmf/pull/811) ### Minor improvements -- Added support for reading and writing `ExternalResources` to and from denormalized TSV files. @mavaylon [#799](https://github.com/hdmf-dev/hdmf/pull/799) -- Changed the name of `ExternalResources.export_to_sqlite` to `ExternalResources.to_sqlite`. @mavaylon [#799](https://github.com/hdmf-dev/hdmf/pull/799) -- Updated the tutorial for `ExternalResources`. @mavaylon [#799](https://github.com/hdmf-dev/hdmf/pull/799) +- Added support for reading and writing `ExternalResources` to and from denormalized TSV files. @mavaylon1 [#799](https://github.com/hdmf-dev/hdmf/pull/799) +- Changed the name of `ExternalResources.export_to_sqlite` to `ExternalResources.to_sqlite`. @mavaylon1 [#799](https://github.com/hdmf-dev/hdmf/pull/799) +- Updated the tutorial for `ExternalResources`. @mavaylon1 [#799](https://github.com/hdmf-dev/hdmf/pull/799) - Added `message` argument for assert methods defined by `hdmf.testing.TestCase` to allow developers to include custom error messages with asserts. @oruebel [#812](https://github.com/hdmf-dev/hdmf/pull/812) - Clarify the expected chunk shape behavior for `DataChunkIterator`. @oruebel [#813](https://github.com/hdmf-dev/hdmf/pull/813) @@ -361,7 +362,7 @@ the fields (i.e., when the constructor sets some fields to fixed values). @rly - Plotted results in external resources tutorial. @oruebel (#667) - Added support for Python 3.10. @rly (#679) - Updated requirements. @rly @TheChymera (#681) -- Improved testing for `ExternalResources`. @mavaylon (#673) +- Improved testing for `ExternalResources`. @mavaylon1 (#673) - Improved docs for export. @rly (#674) - Enhanced data chunk iteration speeds through new ``GenericDataChunkIterator`` class. @CodyCBakerPhD (#672) - Enhanced issue template forms on GitHub. @CodyCBakerPHD (#700) @@ -437,7 +438,7 @@ the fields (i.e., when the constructor sets some fields to fixed values). @rly - Allow passing ``index=True`` to ``DynamicTable.to_dataframe()`` to support returning `DynamicTableRegion` columns as indices or Pandas DataFrame. @rly (#579) - Improve ``DynamicTable`` documentation. @rly (#639) -- Updated external resources tutorial. @mavaylon (#611) +- Updated external resources tutorial. @mavaylon1 (#611) ### Breaking changes and deprecations - Previously, when using ``DynamicTable.__getitem__`` or ``DynamicTable.get`` to access a selection of a @@ -522,7 +523,7 @@ the fields (i.e., when the constructor sets some fields to fixed values). @rly - Add experimental namespace to HDMF common schema. New data types should go in the experimental namespace (hdmf-experimental) prior to being added to the core (hdmf-common) namespace. The purpose of this is to provide a place to test new data types that may break backward compatibility as they are refined. @ajtritt (#545) - - `ExternalResources` was changed to support storing both names and URIs for resources. @mavaylon (#517, #548) + - `ExternalResources` was changed to support storing both names and URIs for resources. @mavaylon1 (#517, #548) - The `VocabData` data type was replaced by `EnumData` to provide more flexible support for data from a set of fixed values. - Added `AlignedDynamicTable`, which defines a `DynamicTable` that supports storing a collection of sub-tables. diff --git a/docs/gallery/plot_term_set.py b/docs/gallery/plot_term_set.py index c1f7c7257..8bf2375aa 100644 --- a/docs/gallery/plot_term_set.py +++ b/docs/gallery/plot_term_set.py @@ -67,6 +67,7 @@ """ from hdmf.common import DynamicTable, VectorData import os +import numpy as np try: import linkml_runtime # noqa: F401 @@ -129,6 +130,19 @@ data=TermSetWrapper(value=['Homo sapiens'], termset=terms) ) +###################################################### +# Validate Compound Data with TermSetWrapper +# ---------------------------------------------------- +# :py:class:`~hdmf.term_set.TermSetWrapper` can be wrapped around compound data. +# The user will set the field within the compound data type that is to be validated +# with the termset. +c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) +data = VectorData( + name='species', + description='...', + data=TermSetWrapper(value=c_data, termset=terms, field='species') +) + ###################################################### # Validate Attributes with TermSetWrapper # ---------------------------------------------------- diff --git a/src/hdmf/data_utils.py b/src/hdmf/data_utils.py index 2df66106d..23f0b4019 100644 --- a/src/hdmf/data_utils.py +++ b/src/hdmf/data_utils.py @@ -20,7 +20,10 @@ def append_data(data, arg): data.append(arg) return data elif isinstance(data, np.ndarray): - return np.append(data, np.expand_dims(arg, axis=0), axis=0) + if len(data.dtype)>0: # data is a structured array + return np.append(data, arg) + else: # arg is a scalar or row vector + return np.append(data, np.expand_dims(arg, axis=0), axis=0) elif isinstance(data, h5py.Dataset): shape = list(data.shape) shape[0] += 1 diff --git a/src/hdmf/term_set.py b/src/hdmf/term_set.py index 1464f505c..0f42819b0 100644 --- a/src/hdmf/term_set.py +++ b/src/hdmf/term_set.py @@ -216,19 +216,26 @@ class TermSetWrapper: {'name': 'value', 'type': (list, np.ndarray, dict, str, tuple), 'doc': 'The target item that is wrapped, either data or attribute.'}, + {'name': 'field', 'type': str, 'default': None, + 'doc': 'The field within a compound array.'} ) def __init__(self, **kwargs): self.__value = kwargs['value'] self.__termset = kwargs['termset'] + self.__field = kwargs['field'] self.__validate() def __validate(self): - # check if list, tuple, array - if isinstance(self.__value, (list, np.ndarray, tuple)): # TODO: Future ticket on DataIO support - values = self.__value - # create list if none of those -> mostly for attributes + if self.__field is not None: + values = self.__value[self.__field] else: - values = [self.__value] + # check if list, tuple, array + if isinstance(self.__value, (list, np.ndarray, tuple)): + values = self.__value + # create list if none of those -> mostly for scalar attributes + else: + values = [self.__value] + # iteratively validate bad_values = [] for term in values: @@ -243,6 +250,10 @@ def __validate(self): def value(self): return self.__value + @property + def field(self): + return self.__field + @property def termset(self): return self.__termset @@ -273,26 +284,55 @@ def __iter__(self): """ return self.__value.__iter__() + def __multi_validation(self, data): + """ + append_data includes numpy arrays. This is not the same as list append. + Numpy array append is essentially list extend. Now if a user appends an array (for compound data), we need to + support validating arrays with multiple items. This method is an internal bulk validation + check for numpy arrays and extend. + """ + bad_values = [] + for item in data: + if not self.termset.validate(term=item): + bad_values.append(item) + return bad_values + def append(self, arg): """ This append resolves the wrapper to use the append of the container using the wrapper. """ - if self.termset.validate(term=arg): - self.__value = append_data(self.__value, arg) + if isinstance(arg, np.ndarray): + if self.__field is not None: # compound array + values = arg[self.__field] + else: + msg = "Array needs to be a structured array with compound dtype. If this does not apply, use extend." + raise ValueError(msg) else: - msg = ('"%s" is not in the term set.' % arg) + values = [arg] + + bad_values = self.__multi_validation(values) + + if len(bad_values)!=0: + msg = ('"%s" is not in the term set.' % ', '.join([str(value) for value in bad_values])) raise ValueError(msg) + self.__value = append_data(self.__value, arg) + def extend(self, arg): """ This append resolves the wrapper to use the extend of the container using the wrapper. """ - bad_data = [] - for item in arg: - if not self.termset.validate(term=item): - bad_data.append(item) + if isinstance(arg, np.ndarray): + if self.__field is not None: # compound array + values = arg[self.__field] + else: + values = arg + else: + values = arg + + bad_data = self.__multi_validation(values) if len(bad_data)==0: self.__value = extend_data(self.__value, arg) diff --git a/tests/unit/common/test_table.py b/tests/unit/common/test_table.py index f2d03332f..00b3c14a3 100644 --- a/tests/unit/common/test_table.py +++ b/tests/unit/common/test_table.py @@ -220,6 +220,101 @@ def test_add_row_validate_bad_data_all_col(self): with self.assertRaises(ValueError): species.add_row(Species_1='bad data', Species_2='bad data') + def test_compound_data_append(self): + c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + c_data2 = np.array([('Mus musculus', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + compound_vector_data = VectorData( + name='Species_1', + description='...', + data=c_data + ) + compound_vector_data.append(c_data2) + + np.testing.assert_array_equal(compound_vector_data.data, np.append(c_data, c_data2)) + + @unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed") + def test_array_append_error(self): + c_data = np.array(['Homo sapiens']) + c_data2 = np.array(['Mus musculus']) + + terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml') + vectordata_termset = VectorData( + name='Species_1', + description='...', + data=TermSetWrapper(value=c_data, termset=terms) + ) + + with self.assertRaises(ValueError): + vectordata_termset.append(c_data2) + + def test_compound_data_extend(self): + c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + c_data2 = np.array([('Mus musculus', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + compound_vector_data = VectorData( + name='Species_1', + description='...', + data=c_data + ) + compound_vector_data.extend(c_data2) + + np.testing.assert_array_equal(compound_vector_data.data, np.vstack((c_data, c_data2))) + + @unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed") + def test_add_ref_wrapped_array_append(self): + data = np.array(['Homo sapiens']) + data2 = 'Mus musculus' + terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml') + vector_data = VectorData( + name='Species_1', + description='...', + data=TermSetWrapper(value=data, termset=terms) + ) + vector_data.append(data2) + + np.testing.assert_array_equal(vector_data.data.data, np.append(data, data2)) + + @unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed") + def test_add_ref_wrapped_array_extend(self): + data = np.array(['Homo sapiens']) + data2 = np.array(['Mus musculus']) + terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml') + vector_data = VectorData( + name='Species_1', + description='...', + data=TermSetWrapper(value=data, termset=terms) + ) + vector_data.extend(data2) + + np.testing.assert_array_equal(vector_data.data.data, np.vstack((data, data2))) + + @unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed") + def test_add_ref_wrapped_compound_data_append(self): + c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + c_data2 = np.array([('Mus musculus', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml') + compound_vector_data = VectorData( + name='Species_1', + description='...', + data=TermSetWrapper(value=c_data, field='species', termset=terms) + ) + compound_vector_data.append(c_data2) + + np.testing.assert_array_equal(compound_vector_data.data.data, np.append(c_data, c_data2)) + + @unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed") + def test_add_ref_wrapped_compound_data_extend(self): + c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + c_data2 = np.array([('Mus musculus', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml') + compound_vector_data = VectorData( + name='Species_1', + description='...', + data=TermSetWrapper(value=c_data, field='species', termset=terms) + ) + compound_vector_data.extend(c_data2) + + np.testing.assert_array_equal(compound_vector_data.data.data, np.vstack((c_data, c_data2))) + def test_constructor_bad_columns(self): columns = ['bad_column'] msg = "'columns' must be a list of dict, VectorData, DynamicTableRegion, or VectorIndex" diff --git a/tests/unit/test_term_set.py b/tests/unit/test_term_set.py index 99bd6bf59..1d7721f1b 100644 --- a/tests/unit/test_term_set.py +++ b/tests/unit/test_term_set.py @@ -155,21 +155,22 @@ def setUp(self): self.wrapped_array = TermSetWrapper(value=np.array(['Homo sapiens']), termset=self.termset) self.wrapped_list = TermSetWrapper(value=['Homo sapiens'], termset=self.termset) + c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')]) + self.wrapped_comp_array = TermSetWrapper(value=c_data, + termset=self.termset, + field='species') + self.np_data = VectorData( name='Species_1', description='...', data=self.wrapped_array ) - self.list_data = VectorData( - name='Species_1', - description='...', - data=self.wrapped_list - ) def test_properties(self): self.assertEqual(self.wrapped_array.value, ['Homo sapiens']) self.assertEqual(self.wrapped_array.termset.view_set, self.termset.view_set) self.assertEqual(self.wrapped_array.dtype, 'U12') # this covers __getattr__ + self.assertEqual(self.wrapped_comp_array.field, 'species') def test_get_item(self): self.assertEqual(self.np_data.data[0], 'Homo sapiens')