Use `sleap-io` video backend #79

aaprasad · 2024-08-30T14:39:06Z

Here we switch to using sleap_io as the video backend for animal data

Summary by CodeRabbit

New Features
- Introduced batch processing for model evaluation and tracking, enhancing efficiency.
- Improved error handling and user prompts for configuration inputs.
Bug Fixes
- Enhanced dataset and dataloader methods to handle empty datasets, improving robustness.
Refactor
- Streamlined video handling and configuration management for better performance and clarity.
Style
- Improved logging for better user insights during data processing and evaluation.

coderabbitai · 2024-08-30T14:39:17Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes involve refactoring how video files and configurations are managed across various modules. Key modifications include transitioning from list-based to dictionary-based video reader management, enhancing error handling for dataset retrieval, and shifting to batch processing for model evaluations. Additionally, input data normalization and improved logging practices were introduced to enhance clarity and robustness in handling datasets and model outputs.

Changes

Files	Change Summary
`dreem/datasets/sleap_dataset.py`	Refactored video handling from a list to a dictionary for video readers, improved error handling in `get_instances`, and updated destructor for proper resource management.
`dreem/inference/eval.py`	Modified configuration handling in `run` function to support batch processing with CSV input, improved user prompts for pod index, and retained checkpoint loading conditionally.
`dreem/inference/track.py`	Updated `run` function to replace checkpoint loading with batch configuration handling, enhanced error prompts, and refined logging for hyperparameters.
`dreem/io/config.py`	Enhanced `get_dataset` and `get_dataloader` methods to return `None` for empty datasets, added logging for warnings, and implemented consistent file sorting for labels and videos.
`dreem/models/embedding.py`	Added normalization for the `times` tensor in the `_learned_temp_embedding` method, ensuring consistent input scaling.
`dreem/models/gtr_runner.py`	Simplified video name extraction and updated ground truth track ID handling by using list comprehension instead of tensor methods, improving code clarity.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Config
    participant Model
    participant Dataset

    User->>Config: Request batch configuration
    Config->>Model: Load hyperparameters from CSV
    Model-->>User: Confirm hyperparameters loaded
    User->>Dataset: Request dataset
    Dataset-->>User: Return dataset or None
    User->>Model: Start evaluation with dataset
    Model-->>User: Return evaluation results

🐰 In the meadow where I hop and play,
New changes come to brighten the day!
With videos neat and datasets bright,
Batch processing makes everything right!
So let’s dance and twirl in delight,
For clearer paths and data in sight! 🌼✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Caution

Inline review comments failed to post

Actionable comments posted: 4

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2af0dd5 and 41454f7.

Files selected for processing (6)

dreem/datasets/sleap_dataset.py (4 hunks)
dreem/inference/eval.py (1 hunks)
dreem/inference/track.py (2 hunks)
dreem/io/config.py (6 hunks)
dreem/models/embedding.py (1 hunks)
dreem/models/gtr_runner.py (2 hunks)

Additional context used

Ruff

dreem/inference/eval.py

30-30: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

56-56: f-string without any placeholders

Remove extraneous f prefix

(F541)

dreem/inference/track.py

100-100: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

dreem/datasets/sleap_dataset.py

163-163: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

dreem/io/config.py

200-200: Undefined name SleapDataset

(F821)

200-200: Undefined name MicroscopyDataset

(F821)

200-200: Undefined name CellTrackingDataset

(F821)

261-261: Undefined name SleapDataset

(F821)

261-261: Undefined name MicroscopyDataset

(F821)

261-261: Undefined name CellTrackingDataset

(F821)

Additional comments not posted (7)

dreem/inference/track.py (1)

137-144: Enhance logging for tracking and results saving.

The logging statements provide valuable insights into the tracking process and results saving. Ensuring that these logs are informative and clear is crucial for debugging and user understanding.

The logging statements are well-placed and provide necessary information about the process flow.

dreem/models/gtr_runner.py (1)

Line range hint 304-325: Simplify video name extraction and improve data storage logic.

The changes to simplify the extraction of video names and the handling of ground truth track IDs are well-thought-out. These modifications enhance clarity and maintainability of the code. However, ensure that the new method of extracting gt_track_id is thoroughly tested to prevent any data integrity issues.

The changes are appropriate and improve the code quality. Ensure thorough testing to confirm the functionality.

dreem/datasets/sleap_dataset.py (2)

109-109: Refactor: Initialize vid_readers dictionary.

The initialization of self.vid_readers as an empty dictionary is a good practice as it prepares the class to manage video readers efficiently. This change aligns with the shift from a list-based to a dictionary-based management system, enhancing performance and memory management.

370-370: Proper Resource Management in Destructor.

The update to the destructor method to iterate over self.vid_readers and close each reader is crucial for preventing resource leaks. This ensures that all video readers are properly closed before the object is garbage collected, aligning with best practices for resource management.

dreem/models/embedding.py (1)

326-326: Normalization Added to _learned_temp_embedding.

The addition of normalization to the times tensor by dividing it by its maximum value is a crucial enhancement. This ensures that the tensor is scaled between 0 and 1, which can prevent issues related to varying scales in the input data. This change is likely to improve the robustness and stability of the model's learning process.

dreem/io/config.py (2)

Line range hint 200-257: Enhanced Robustness in get_dataset.

The update to allow the get_dataset method to return None if the dataset is empty is a significant improvement in error handling. This change ensures that the application can gracefully handle cases where no valid data is available, preventing potential errors in downstream processing. The logging of a warning when the dataset is empty is also a good practice, as it informs the user or developer about the issue.

Tools

Ruff

261-261: Undefined name SleapDataset

(F821)

261-261: Undefined name MicroscopyDataset

(F821)

261-261: Undefined name CellTrackingDataset

(F821)

Line range hint 261-313: Improved Error Handling in get_dataloader.

The modifications to the get_dataloader method to handle cases where the dataset might be None or empty enhance the robustness of data handling. The addition of warnings when the dataset is None or empty is crucial for informing the user about potential issues, preventing errors during data loading. This change aligns with best practices in error handling and user feedback.

Comments failed to post (4)

dreem/inference/eval.py (2)
46-57: Ensure consistent logging and information display.

The logging statements are helpful for debugging and understanding the flow of batch processing. However, the use of an f-string without placeholders in line 56 is unnecessary and should be corrected.

Remove the extraneous f prefix from the logging statement to clean up the code:
- logger.info(f"Using the following tracker:")
+ logger.info("Using the following tracker:")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
        hparams = {}

    checkpoint = eval_cfg.cfg.ckpt_path

    logger.info(f"Testing model saved at {checkpoint}")
    model = GTRRunner.load_from_checkpoint(checkpoint)

    model.tracker_cfg = eval_cfg.cfg.tracker
    model.tracker = Tracker(**model.tracker_cfg)

    logger.info("Using the following tracker:")
Tools

Ruff

56-56: f-string without any placeholders

Remove extraneous f prefix

(F541)

29-44: Refactor batch configuration handling and improve error messages.

The changes to handle batch configurations and improve error handling are significant. The use of cfg.keys() should be simplified to key in cfg as suggested by static analysis. Additionally, the error message can be made more user-friendly by providing a clearer prompt when the POD_INDEX is not found.

Apply these changes to improve code readability and error handling:
- if "batch_config" in cfg.keys():
+ if "batch_config" in cfg:

- input(f"{e}. Assuming single run!\nPlease input task index to run:")
+ input(f"Environment variable POD_INDEX not found. Assuming single run!\nPlease input task index to run:")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    # update with parameters for batch train job
    if "batch_config" in cfg:
        try:
            index = int(os.environ["POD_INDEX"])
        except KeyError as e:
            index = int(
                input("Environment variable POD_INDEX not found. Assuming single run!\nPlease input task index to run:")
            )

        hparams_df = pd.read_csv(cfg.batch_config)
        hparams = hparams_df.iloc[index].to_dict()
        _ = hparams.pop("Unnamed: 0", None)

        if eval_cfg.set_hparams(hparams):
            logger.info("Updated the following hparams to the following values")
            logger.info(hparams)
Tools

Ruff

30-30: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
dreem/inference/track.py (1)
99-114: Refactor batch configuration handling and improve error messages.

The changes to handle batch configurations and improve error handling are significant. The use of cfg.keys() should be simplified to key in cfg as suggested by static analysis. Additionally, the error message can be made more user-friendly by providing a clearer prompt when the POD_INDEX is not found.

Apply these changes to improve code readability and error handling:
- if "batch_config" in cfg.keys():
+ if "batch_config" in cfg:

- input(f"{e}. Assuming single run!\nPlease input task index to run:")
+ input(f"Environment variable POD_INDEX not found. Assuming single run!\nPlease input task index to run:")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
    # update with parameters for batch train job
    if "batch_config" in cfg:
        try:
            index = int(os.environ["POD_INDEX"])
        except KeyError as e:
            index = int(
                input("Environment variable POD_INDEX not found. Assuming single run!\nPlease input task index to run:")
            )

        hparams_df = pd.read_csv(cfg.batch_config)
        hparams = hparams_df.iloc[index].to_dict()
        _ = hparams.pop("Unnamed: 0", None)

        if pred_cfg.set_hparams(hparams):
            logger.info("Updated the following hparams to the following values")
            logger.info(hparams)
Tools

Ruff

100-100: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
dreem/datasets/sleap_dataset.py (1)
162-167: Error Handling Improvement in get_instances.

The addition of a try-except block to handle FileNotFoundError when accessing video frames is a significant improvement. This change ensures that the application can gracefully handle missing files and continue operation by attempting to load the video again. However, the local variable e is assigned but never used, which is flagged by static analysis tools.

Consider removing the assignment to e since it's not used, or log the exception for debugging purposes:
- except FileNotFoundError as e:
+ except FileNotFoundError:
+     logger.error("File not found: {video_name}")
Committable suggestion was skipped due to low confidence.

Tools

Ruff

163-163: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

- Add support for specifying multiple train/val directories in a list, for data paths in config files. Currently only single directory is supported

…el checkpoints

…olab/dreem into aadi/sleap-io-video-backend

use sleap-io as video backend instead of imageio

41454f7

aaprasad changed the base branch from main to aadi/batch-inference-eval August 30, 2024 14:44

coderabbitai bot reviewed Aug 30, 2024

View reviewed changes

lint

64c970b

aaprasad linked an issue Aug 30, 2024 that may be closed by this pull request

Change video backend from imageio PIL.Image and skimage.io to sleap_io.Video? #56

Open

aaprasad removed a link to an issue Aug 30, 2024

Change video backend from imageio PIL.Image and skimage.io to sleap_io.Video? #56

Open

aaprasad linked an issue Aug 30, 2024 that may be closed by this pull request

Change video backend from imageio PIL.Image and skimage.io to sleap_io.Video? #56

Open

aaprasad and others added 15 commits October 17, 2024 10:51

Refactor config and patch small typo (#87)

38c4cb3

patch small windows file name bug (#88)

89599bc

patch get_datapaths to use dict.get (#90)

a90b2be

patch checkpointing (#92)

9356a68

disable struct mode in init for pop method later (#94)

11a33c8

Mustafa multiple data paths (#97)

9bf0af7

- Add support for specifying multiple train/val directories in a list, for data paths in config files. Currently only single directory is supported

Mustafa multiple crop size (#100)

9cef2a2

Refactor visual encoder and features (#101)

2f5deda

Log train split files (#102)

4341167

Filter max dist (#103)

d79f2f7

clean up print statement (#104)

08256ae

Bug fix max filter dist (#105)

ef05900

allow batch eval/inference flexibility rather than just different mod…

7753d64

…el checkpoints

use sleap-io as video backend instead of imageio

b1f09b5

Merge branch 'aadi/sleap-io-video-backend' of https://github.com/talm…

4ac6d59

…olab/dreem into aadi/sleap-io-video-backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `sleap-io` video backend #79

Use `sleap-io` video backend #79

aaprasad commented Aug 30, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 30, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

Use sleap-io video backend #79

Are you sure you want to change the base?

Use sleap-io video backend #79

Conversation

aaprasad commented Aug 30, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Aug 30, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

Use `sleap-io` video backend #79

Use `sleap-io` video backend #79

aaprasad commented Aug 30, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 30, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)