Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mb_{detect/convert}_encoding. Again. Or maybe pdo? #17931

Open
gzhegow1991 opened this issue Feb 25, 2025 · 0 comments
Open

mb_{detect/convert}_encoding. Again. Or maybe pdo? #17931

gzhegow1991 opened this issue Feb 25, 2025 · 0 comments

Comments

@gzhegow1991
Copy link

gzhegow1991 commented Feb 25, 2025

Description

(For all tests online tool is used: https://onlinephp.io/)

The following code:

<?php

// > this is \PDOException message in Russian, that represents `Server is not responding` (it means - server configuration is not a solution)
$str = base64_decode('U1FMU1RBVEVbSFkwMDBdIFsyMDAyXSDP7uTq6/735e3o5SDt5SDz8fLg7e7i6+Xt7iwg8i7qLiDq7u3l9+376SDq7uzv/P7y5fAg7vLi5fDjIOfg7/Du8SDt4CDv7uTq6/735e3o5S4NCiAoU1FMOiBTRVQgRk9SRUlHTl9LRVlfQ0hFQ0tTPTA7KQ==');

// > and it looks like
// ###
// Warning: Your output contains characters that could not be displayed. Make sure you encode the output when working with special characters or binary data. [Click here for an example on how to do this](https://onlinephp.io/code/utf8-in-the-sandbox)
// SQLSTATE[HY000] [2002] ����������� �� �����������, �.�. �������� ��������� ������ ������ �� �����������.
 (SQL: SET FOREIGN_KEY_CHECKS=0;)
// ###


$mbListEncodings = mb_list_encodings();


$detect = mb_detect_encoding($str);

// PHP_VERSION_ID < 80300 -> 'UTF-8'
// PHP_VERSION_ID >= 80300 -> 'ASCII'
var_dump($detect);


$detect2 = mb_detect_encoding($str, $mbListEncodings, true);
// $detect2 = mb_detect_encoding($str, $mbListEncodings); // > same result, actually isnt, without $strict = true, it may return 'ASCII' if provided below results is not an option, with $strict it returns FALSE then

// PHP_VERSION_ID < 80100 -> 'ISO-8859-1'
// PHP_VERSION_ID >= 80100 -> 'Windows-1252'
var_dump($detect2);


// > accidentally IT WORKS HERE but PHP_VERSION_ID >= 80100
array_unshift($mbListEncodings, 'CP1251');
array_unshift($mbListEncodings, 'Windows-1251');
$detect3 = mb_detect_encoding($str, $mbListEncodings, true);

// PHP_VERSION_ID < 80100 -> 'ISO-8859-1' // > !!! seems as old bug
// PHP_VERSION_ID >= 80100 -> 'Windows-1251'
var_dump($detect3);


$cpDetectedWrong = 'Windows-1252';

$converted = mb_convert_encoding($str, 'UTF-8', $cpDetectedWrong);
$converted_b64 = base64_encode($converted);

var_dump($converted); // string(207) "SQLSTATE[HY000] [2002] Ïîäêëþ÷åíèå íå óñòàíîâëåíî, ò.ê. êîíå÷íûé êîìïüþòåð îòâåðã çàïðîñ íà ïîäêëþ÷åíèå.
 (SQL: SET FOREIGN_KEY_CHECKS=0;)"

var_dump($converted_b64); // string(276) "U1FMU1RBVEVbSFkwMDBdIFsyMDAyXSDDj8Ouw6TDqsOrw77Dt8Olw63DqMOlIMOtw6Ugw7PDscOyw6DDrcOuw6LDq8Olw63Driwgw7Iuw6ouIMOqw67DrcOlw7fDrcO7w6kgw6rDrsOsw6/DvMO+w7LDpcOwIMOuw7LDosOlw7DDoyDDp8Ogw6/DsMOuw7Egw63DoCDDr8Ouw6TDqsOrw77Dt8Olw63DqMOlLg0KIChTUUw6IFNFVCBGT1JFSUdOX0tFWV9DSEVDS1M9MDsp"

But I expected this output instead:

<?php

$detect = mb_detect_encoding($str, mb_list_encodings(), true);
var_dump($detect); // 'Windows-1251'

$cpDectectedCorrect = 'Windows-1251';

$converted = mb_convert_encoding($str, 'UTF-8', $cpDectectedCorrect);
$converted_b64 = base64_encode($converted);

var_dump($converted); // string(207) "SQLSTATE[HY000] [2002] Подключение не установлено, т.к. конечный компьютер отверг запрос на подключение.
 (SQL: SET FOREIGN_KEY_CHECKS=0;)"

var_dump($converted_b64); // string(276) "U1FMU1RBVEVbSFkwMDBdIFsyMDAyXSDQn9C+0LTQutC70Y7Rh9C10L3QuNC1INC90LUg0YPRgdGC0LDQvdC+0LLQu9C10L3Qviwg0YIu0LouINC60L7QvdC10YfQvdGL0Lkg0LrQvtC80L/RjNGO0YLQtdGAINC+0YLQstC10YDQsyDQt9Cw0L/RgNC+0YEg0L3QsCDQv9C+0LTQutC70Y7Rh9C10L3QuNC1Lg0KIChTUUw6IFNFVCBGT1JFSUdOX0tFWV9DSEVDS1M9MDsp"

I've tried using mb_check_encoding()... I've played for few hours with mb_detect_order(), mb_list_encodings()... I've even tried to split known encodings by groups by first letters or their slugs and apply mb_convert_encoding for better detection for each group.

No. Just dont work, and should be fixed like

<?php
set_exception_handler(function ($e) {
   $phpMessage = $e->getMessage();

   if ($e instanceof \PDOException) {
     $isUtf8 = preg_match('//u', $phpMessage) === 1;
     if (! $isUtf8) {
       $isWindows = (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN');
       if ($isWindows) {
         $phpMessage = mb_convert_encoding($phpMessage, 'UTF-8', 'CP1251');
       }
     }
   }

   /// ...code
});

PHP Version

PHP 8.4

Operating System

Windows 10

@gzhegow1991 gzhegow1991 changed the title mb_{detect/convert}_encoding. Again. mb_{detect/convert}_encoding. Again. Or maybe pdo? Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant