-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting it to work with multibyte names #70
Comments
Ok it seems I managed to get it working. I just had to use https://github.com/tiamo/spss/blob/master/src/Sav/Record/Info/VariableAttributes.php#L47 Would this change be considered if I PR it? |
As you already note, the usage of So, yes it should be: Have a good day and thanks, for point this out. |
Thanks for the heads up. I am still a bit new to the whole SPSS thing,but I will take a look at the current tests and see if I can make something work.I notice other sub classes of Info are also using I will try making a PR tomorrow :) |
Yes, Info are in general stored as ASCII. We really don't have a detailed specification for when one is used and when not. Testing what happens in practice is the safest thing for now. I have not notice any problem in the Value labels as they are right now, also i used there some Latin characters without a problem. So, I hope it's fine the way it is. The Value labels have several problems that need a fix, but it's for now unrelated with the size. The problems are related with used the same instances of the ValueLabel in more than one variable. |
Really I think that only I will see if i have the time to work on something that fixed the problematic in a better way. Specially for what occurs here: https://github.com/tiamo/spss/blob/master/src/Sav/Record/ValueLabel.php#L99-L106 |
I opened a pull request #72 and there i try to make more clear what is needed. The point is that: All sizes should be calculate in bytes, but the internal data is not necessary encoded in ASCII. So, cut the string in bytes means that we can cut the last character in the wrong position, because for example: In UTF-8 one character can have more than one byte. The idea is then to find the cut in bytes that guarantees that no character becomes invalid when the string is saved in the charset of the resulting file (by default UTF-8). That need to be tested for a while and then if there are not problem, can be considered to be merged. |
Ah now that is quite interesting. Both seem like great PR's. I will pull them and see if I can break them in any ways :) Great work! Edit: Which version of php are you using while developing? I notice that I cannot install composer dependencies with php 8. I can do a PR to update it to work with php ^8.0 |
I don' t know what @tiamo want to do about the new php versions, it become to be tricky to support all with the same code and i don' t know what can be dropped. So, i don' t want to take any decision on that for now. I have PHP 8.2.7, but i use my own copy of the library. I have a wrapper class that use the libray internally and in that way the things are easy to me to be integrate any changes without depend of an specific version of the library and without to break the rest of the code. So, i really don' t care what version of the php the library support. |
Yeah I have it running in a PHP 8.2 project as well and it works just fine. The issue is that when working on a PR I am forced to use an old version of PHP to install the dependencies. I think I will just make a temporary docker container with php 7.4 for running tests while making PR's :) |
Well, in reality there is only one dependency and in Debian it is installed as easy as executing the command: <?php
$rootPathSPSS = dirname(dirname(__FILE__));
require_once($rootPathSPSS."/tests/TestCase.php");
require_once($rootPathSPSS."/src/Utils.php");
require_once($rootPathSPSS."/src/Exception.php");
require_once($rootPathSPSS."/src/Buffer.php");
require_once($rootPathSPSS."/src/Sav/Variable.php");
require_once($rootPathSPSS."/src/Sav/RecordInterface.php");
require_once($rootPathSPSS."/src/Sav/Record.php");
require_once($rootPathSPSS."/src/Sav/Record/InfoCollection.php");
require_once($rootPathSPSS."/src/Sav/Record/Header.php");
require_once($rootPathSPSS."/src/Sav/Record/Variable.php");
require_once($rootPathSPSS."/src/Sav/Record/Data.php");
require_once($rootPathSPSS."/src/Sav/Record/ValueLabel.php");
require_once($rootPathSPSS."/src/Sav/Record/Document.php");
require_once($rootPathSPSS."/src/Sav/Record/Info.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/MachineInteger.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/MachineFloatingPoint.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/VariableDisplayParam.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/LongVariableNames.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/VeryLongString.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/ExtendedNumberOfCases.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/Unknown.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/DataFileAttributes.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/VariableAttributes.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/CharacterEncoding.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/LongStringMissingValues.php");
require_once($rootPathSPSS."/src/Sav/Record/Info/LongStringValueLabels.php");
require_once($rootPathSPSS."/src/Sav/Reader.php");
require_once($rootPathSPSS."/src/Sav/Writer.php");
?> |
Thanks for the suggestion. I just made a simple Dockerfile instead :) FROM php:7.4-cli
RUN curl -sSL https://github.com/mlocati/docker-php-extension-installer/releases/latest/download/install-php-extensions -o - | sh -s \
bcmath zip
COPY --from=composer:latest /usr/bin/composer /usr/local/bin/composer
RUN apt-get -y update && apt-get -y install git
WORKDIR 'app' docker build -t spss-app .
docker run -it --rm -v ./:/app spss-app bash |
Has anyone had any luck getting this package to work with multi byte names?
If I fix the regex to allow it I get errors like
The names include danish letters like æøå/ÆØÅ which are valid in SPSS variable names. Example from pspp

I have tried making my own version where I replaced all php methods to the mb_ version (like mb_strlen), but sadly it does not seem to work. :/
The text was updated successfully, but these errors were encountered: