Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add function to read adat files into widely used AnnData object #11

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

megadesk
Copy link

@megadesk megadesk commented Jan 3, 2025

add function to read adat files into widely used AnnData object
read_adat_2_AnnData()
Returns an adata object instead a somalogic adat file path from the filepath/name.
Parameters
----------
path_or_buf : Union[str, TextIO]
Path or buffer that the file will be read from
Examples
--------
>>> adata=read_adat_2_AnnData(example_data_file)
Returns
-------
adata object instead of : Adat object

read_adat_2_AnnData()
Returns an adata object instead a somalogic adat file path from the filepath/name.
    Parameters
    ----------
    path_or_buf : Union[str, TextIO]
        Path or buffer that the file will be read from
    Examples
    --------
    >>> adata=read_adat_2_AnnData(example_data_file)
    Returns
    -------
    adata object instead of  : Adat object
add function to read read adat files into widely used AnnData object
@megadesk megadesk changed the title add function to read read adat files into widely used AnnData object add function to read adat files into widely used AnnData object Jan 3, 2025
@kyoung73 kyoung73 self-assigned this Jan 6, 2025
@kyoung73
Copy link
Contributor

kyoung73 commented Jan 6, 2025

Hi @megadesk! First, thank you for your contribution! To maintain a lightweight codebase, we would prefer to avoid introducing additional dependencies, such as AnnData, into the source code for now. In the future, we may consider adding AnnData as an optional dependency and add a function like this in. It is recommended to create a helper function downstream (in your code) to encapsulate the entire workflow in a single function, simplifying the process. While slightly more convoluted, the following approach demonstrates how this can be achieved using read_adat:

import somadata as sd
import anndata as ad

adat = sd.read_adat("tests/data/control_data.adat")

row_metadata = {name: adat.index.get_level_values(name).tolist() for name in adat.index.names}
column_metadata = {name: adat.columns.get_level_values(name).tolist() for name in adat.columns.names}
header_metadata = {key: str(value) for key, value in adat.header_metadata.items()}

adata = ad.AnnData(X = adat.values, obs = row_metadata, var = column_metadata, uns = header_metadata)

Alternatively, I updated the code (1.2.0) to streamline using parse_file and default all header metadata values to be string type. Hope this makes things easier:

from somadata import parse_file
import anndata as ad
import numpy as np

rfu_matrix, row_metadata, column_metadata, header_metadata = parse_file("tests/data/control_data.adat", compatibility_mode=True)

adata = ad.AnnData(X = np.array(rfu_matrix), obs = row_metadata, var = column_metadata, uns = header_metadata)

Please reach out if there's any questions. Happy to discuss further!

@megadesk
Copy link
Author

Hi @kyoung73 ,

Yes these are all workable solutions to access the somadata.io.adat.file.parse_file() function.

I look forward to the addition of AnnData as an optional dependency. It is a perfect for this type of data (feature matrix with row/column metadata).

The multi index panda dataframe based object is not very end-user friendly in comparison to operations on an Anndata object.

Also storing the the data matrix and column/row metadata in Anndata object would remove the need for most of the helper functions in the somadata code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants