Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing (almost) empty files #156

Merged
merged 2 commits into from
Aug 3, 2024

Conversation

jannis-baum
Copy link
Owner

Close #155

@jannis-baum
Copy link
Owner Author

@tuurep: Nice, sounds like an easy problem 😄

So the thing that you observed is easy to fix, but while fixing this I found another issue that is not easy to fix😅

Files that have only 1 character, e.g. a line feed (0a) are recognized by file as arbitrary binary data (application/octet-stream). So if you create a file with Vim, save it, start using Vivify, write some stuff, delete it (up to here everything works), but then close Vivify until the server shuts down and then call viv again on the "empty" file, it will break and download the file because Vivify will think it's binary because of that one single character (the line break that Vim saved). Same thing probably as well if you just create the file, write some stuff, delete "everything", save it, and then call :Vivify.

Not sure what a good way to fix this is. Maybe check the file's content and fall back to plain text if the length is 1 byte and the character is printable...

@jannis-baum
Copy link
Owner Author

I implemented manual checking now:

  1. If the file has 0 bytes, we always return inode/x-empty (@Tweekism this already fixes the discrepancy with file for Windows that you mentioned on Parse empty files #150)
  2. If the file has 1 byte, we check if that byte is a printable character, and if so return text/plain, which will trigger the renderer that will then interpret the file as Markdown if it has a Markdown extension

@jannis-baum jannis-baum requested a review from tuurep August 3, 2024 15:49
@tuurep
Copy link
Collaborator

tuurep commented Aug 3, 2024

Tested and works beautifully now 👍

Implementation looks quite painful and what an unexpected problem once again 😄 Do you think it's good & worth it?

Copy link
Collaborator

@tuurep tuurep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd approve based on testing this

@jannis-baum jannis-baum merged commit 2df370a into main Aug 3, 2024
6 checks passed
@jannis-baum jannis-baum deleted the issue/155-improve-parsing-almost-empty-files branch August 3, 2024 19:08
@jannis-baum
Copy link
Owner Author

Implementation looks quite painful and what an unexpected problem once again 😄 Do you think it's good & worth it?

Yeah haha. I do think it's good & worth it. It was definitely a bug, no way to spin that around haha, so it was necessary to fix.

The implementation should be very robust. 0-byte files will always be empty and 1-byte files with a printable character can't be anything "useful" that serves some purpose that breaks when we interpret them as text files.

Performance-wise it's also fine: For the majority of cases (files bigger than 1 byte) it's the same as before (the child process and getting file stats are both outside processes executed in parallel) and for 1-byte files the complexity of reading 1 byte from a file is added which is negligible.

@tuurep
Copy link
Collaborator

tuurep commented Aug 3, 2024

Ok, sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve parsing (almost) empty files
2 participants