Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing instance frame #108

Merged
merged 2 commits into from
Feb 4, 2025
Merged

Fix missing instance frame #108

merged 2 commits into from
Feb 4, 2025

Conversation

shaikh58
Copy link
Contributor

@shaikh58 shaikh58 commented Feb 4, 2025

We are currently not checking for missing frames in the input .slp file to ensure alignment between expected number of frames from the video, and LabeledFrame count. We assume that frame_ind from chunked frames matches lf.frame_idx but it does not. The issue is that we index the labels based on enumerating the length of the slp Labels() object. We need to ensure the frame_ind in SleapDataset matches lf.frame_idx

This line highlights the source of the issue (len(labels)): self.frame_idx = [torch.arange(len(labels)) for labels in self.labels]
Fix: First, enumerate frame_idx using the lf.frame_idx value of the last frame in the slp file. This ensures the indexing stays correct till the end of the video. Then, since there will now be missing frame ids in slp, check that the chunk’s frame index is equal to lf.frame_idx, and if not, skip the frame. But, we need to keep track of how many frames we’ve skipped, otherwise even if frame_ind = lf.frame_idx, trying to index labels[frame_ind] will overshoot the correct frame! Also, since frames can be skipped across batches, the skip counter must be part of the SleapDataset state so that it persists throughout the entire video! (Rather than being reset every batch)

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced dataset reliability by refining the frame loading process to better accommodate missing frames.
    • Improved error feedback during video frame processing, resulting in a more stable and predictable experience.

Copy link
Contributor

coderabbitai bot commented Feb 4, 2025

Walkthrough

The pull request improves the robustness of the SleapDataset class by modifying how frame indices are determined and handled during video frame loading. Instead of using the length of the labels, the updated implementation derives the frame range from the last label frame index, which better accommodates missing frames. Additionally, a new attribute, skipped_frame_ct, is introduced to count skipped frames for each label. The get_instances method now adjusts the frame index by subtracting the skipped frames count and logs warnings when a requested frame lacks valid instances.

Changes

File Path Change Summary
dreem/.../sleap_dataset.py Updated frame indexing: uses last label frame index instead of label length; introduced skipped_frame_ct to track missing frames; modified get_instances to adjust indices and log warnings when frames are missing.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SleapDataset
    participant Logger

    Client->>SleapDataset: get_instances(label_idx, frame_idx)
    Note over SleapDataset: Determine frame range from last label frame index
    SleapDataset->>SleapDataset: Check if frame index is valid
    alt Valid Frame
        SleapDataset->>SleapDataset: Adjust frame index (subtract skipped_frame_ct)
        SleapDataset->>Client: Return frame instances
    else Missing Frame
        SleapDataset->>Logger: Log warning about missing frame
        SleapDataset->>Client: Return empty or adjusted instance list
    end
Loading

Possibly related PRs

  • Fix empty batch #106: Addresses issues with empty batches in a similar dataset context, enhancing robustness in frame handling.

Poem

I'm a rabbit with a joyful hop,
Skipping through frames without a drop,
Counting missed beats with skipped_frame_ct so fine,
Logging a warning when things aren't in line,
Our new flow brings order, smooth and bright,
Hoppy coding days and debugging light!
🐰✨

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
dreem/datasets/sleap_dataset.py (2)

146-152: LGTM! Consider adding docstring for skipped_frame_ct.

The change to use labels[-1].frame_idx + 1 instead of len(labels) is a robust fix for handling missing frames. The addition of skipped_frame_ct helps track missing frames across batches.

Add a docstring comment for the skipped_frame_ct attribute:

 self.frame_idx = [
     torch.arange(labels[-1].frame_idx + 1) for labels in self.labels
 ]
+# Counter for tracking the number of frames skipped due to missing instances
+# for each label file
 self.skipped_frame_ct = [0 for labels in self.labels]

209-217: Improve error handling and documentation.

The frame index adjustment logic is correct, but the error handling and documentation could be enhanced.

Consider these improvements:

-            # if slp is missing instances in some frames, frame_ind will be smaller than lf.frame_idx
+            # Adjust frame index by subtracting skipped frames to maintain correct indexing
             lf = video[frame_ind - self.skipped_frame_ct[label_idx]]
             if frame_ind < lf.frame_idx:
                 logger.warning(
-                    f"Frame index {frame_ind} is trying to access frame {lf.frame_idx} of the slp file {video_name}. "
-                    f"This likely means there are no labelled instances in this frame. Skipping frame."
+                    f"Frame index mismatch in {video_name}: "
+                    f"Expected frame {frame_ind} but found frame {lf.frame_idx}. "
+                    f"No labeled instances found. Skipping frame and incrementing skip counter."
                 )
+                # Increment skip counter to maintain correct indexing in subsequent frames
                 self.skipped_frame_ct[label_idx] += 1
                 continue
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d1b920 and 2dfefe2.

📒 Files selected for processing (1)
  • dreem/datasets/sleap_dataset.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Tests (windows-2022, Python 3.9)
  • GitHub Check: Tests (ubuntu-22.04, Python 3.9)
  • GitHub Check: Lint
🔇 Additional comments (1)
dreem/datasets/sleap_dataset.py (1)

209-211: Verify edge case: First frame missing.

Please verify the behavior when the first frame is missing. The current implementation assumes frame_ind < lf.frame_idx catches missing frames, but we should ensure it works correctly when frame_ind = 0.

Run this test to verify the edge case:

✅ Verification successful

I'll generate a new shell script to display the relevant code snippet from dreem/datasets/sleap_dataset.py (lines 200-220) so that we can verify the edge case handling when frame_ind is 0.


Edge Case Handling for Missing First Frame Verified

The code snippet confirms that when frame_ind is 0 and the corresponding lf.frame_idx is greater than 0 (as expected when the first frame is missing), the condition if frame_ind < lf.frame_idx: will be met. In this case, the warning is logged, the skip counter is incremented, and processing continues appropriately.

  • The condition correctly catches a missing first frame when frame_ind == 0.
  • The logic ensures proper warning and skip behavior when the frame is missing.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for instances where the first frame (index 0) is missing
# Look for frame_idx values in .slp files to verify handling of missing first frame

ast-grep --pattern 'frame_idx = 0' | grep -B 2 -A 2

Length of output: 136


Script:

#!/bin/bash
# Display code snippet for dreem/datasets/sleap_dataset.py in the range covering the edge case (lines 200-220)
sed -n '200,220p' dreem/datasets/sleap_dataset.py

Length of output: 918

@shaikh58 shaikh58 merged commit 9699962 into main Feb 4, 2025
4 checks passed
@shaikh58 shaikh58 deleted the fix-missing-instance-frame-id branch February 4, 2025 07:38
@coderabbitai coderabbitai bot mentioned this pull request Feb 7, 2025
@shaikh58
Copy link
Contributor Author

shaikh58 commented Feb 8, 2025

This fix is known to cause bugs in certain situations e.g. if the frame index of the first frame in a slp file is not 0. The underlying issue has been resolved by #109. The code related to this patch has also been removed as it is redundant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant