Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for fileinfo / file-encoding detection issue #199

Closed
murphy83 opened this issue Sep 17, 2019 · 2 comments
Closed

Support for fileinfo / file-encoding detection issue #199

murphy83 opened this issue Sep 17, 2019 · 2 comments
Labels

Comments

@murphy83
Copy link

Currently I have a project which requires some file analysis, for unit testing we use phpunit combined with vfsStream which works out pretty well most of the time.
However I encountered a problem today when trying to determine the filetype or the encoding of a file in the file-system.
Here is some insight on what I tried so far:
First I ran into trouble use mb_detect_encoding - it failed on an real file system to give correct answers which started of the investigation as some processing further down the stream broke due to this fact.
I tried using "brute force" on a linux-system: exec with the 'file' command. While this works well on real filesystems, of course it will not work out on vfs, causing almost all unit tests for that module to fail.
Resorted to use the fileinfo functions provided by the extension. However, while this one copes well with vfs://my/fileto/test.iso8859.txt - it reports us-ascii although the file is something different (eg. iso-8859 or utf-8).
Currently I do not have any further ideas and I think this is not expected behavior of vfsStream.

@bizurkur
Copy link
Contributor

Doing a couple very simple tests of this, it appears to work. Granted, these are very small sample strings and it might be possible that longer strings create an issue due to lack of support for multiple-byte strings. However, since you mentioned mb_detect_encoding() failed, it almost sounds like the file content may not be entirely the encoding you think it is.

Test of fileinfo:

<?php
$this->root = vfsStream::setup();

$filename = $this->root->url().'/new.txt';
$content = 'áéóú';
file_put_contents($filename, $content);
$finfo = finfo_open(FILEINFO_MIME_ENCODING);
var_dump(finfo_file($finfo, $filename)); // output utf-8
finfo_close($finfo);

$filename = $this->root->url().'/new.txt';
$content = 'asassdasd';
file_put_contents($filename, $content);
$finfo = finfo_open(FILEINFO_MIME_ENCODING);
var_dump(finfo_file($finfo, $filename)); // output us-ascii
finfo_close($finfo);

Can you make a reproducible sample code showing it failing?

@murphy83
Copy link
Author

OK, I rechecked and I think I misunderstood the internals of finfo and most likely also the mb_detect stuff: The testfiles we use mostly rely on the BOM at the beginning of the file (which is the reason for the problems further down the stream). We got those files as "examples" and included them in our tests (positive / negative). But we also have a test that just tries to use a malencoded file (converted to UTF-.32 which is not valid). I will do some further checking to find the details and a way to reproduce the errors. As I am not allowed to use the example files in public, I will have to setup some real synthetic ones, which is not to bad for testing.

@allejo allejo added the bogus label Dec 8, 2019
@allejo allejo closed this as completed Dec 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants