Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: property is passed as integer and cannot be accessed #784

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aistis-
Copy link

@aistis- aistis- commented Feb 11, 2025

Hi, I have an error coming from this lib when validating a complex object. There is an object which holds a private property with an unsorted array where key is integer. The array does not start with 0, but may be any positive integer.

Unfortunately I haven't had time to investigate the exact reason, but maybe someone will 🤞

TypeError: property_exists(): Argument #2 ($property) must be of type string, int given

@DannyvdSluijs
Copy link
Collaborator

Unfortunately I haven't had time to investigate the exact reason, but maybe someone will 🤞

Although your help is appreciated, I’m reluctant to merge these changes without a proper explanation. Some reproduction steps at minimum and the version(s) affected would be a great first step.

@tlamy
Copy link

tlamy commented Feb 11, 2025

As of PHP 7.x, if you json_decode() an object with
• only numeric properties
• in ascending order
• without "holes"
it will be internally converted into a list, with (of course) integer index (or properties).

So in my honest opinion it is absolutely safe to cast them to string for the sake of property_exists()

See https://www.php.net/manual/de/function.json-decode.php#126787

@aistis-
Copy link
Author

aistis- commented Feb 11, 2025

Unfortunately I haven't had time to investigate the exact reason, but maybe someone will 🤞

Although your help is appreciated, I’m reluctant to merge these changes without a proper explanation. Some reproduction steps at minimum and the version(s) affected would be a great first step.

That makes sense. I hope to find some time to add a test with the simplified object+json schema to reproduce the case.

Last version, including dependencies are used. I've discovered it today after bumping v6.0.0.0 -> v6.1.0

Copy link
Contributor

@erayd erayd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, the error that the library is reporting is correct. There is no such thing as a JSON property for which the name is not a string; that type is a requirement of the JSON spec. And JSON also does not support associative arrays, or arrays with holes.

Why would we modify this library to suppress that error, when it's correctly pointing out a problem with your input data? IMO, this is a user error, and the library behaviour is already correct.

The only exception would be if the cast was done proactively, as part of the type coercion feature. If you really do want to have the library cast this, then it should be implemented behind that flag as part of a more general assoc array -> object transformation.

@aistis-
Copy link
Author

aistis- commented Feb 11, 2025

I managed to add an example in a test for my simplified case. The test passes with the fix and does not without.

return [
[
'{"some_property":{"some_sub_property":{"4":10,"2":10}}}',
'{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems needlessly complicated - what is it that you are actually intending to test? Tests should test one thing, clearly. If there is a need to test multiple things, there should be a clear test for each thing.

Copy link
Author

@aistis- aistis- Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example from my case. As I mentioned before, unfortunately, I can not look into that more closely and trace the root cause at the moment.

Hope to come back to this later or maybe someone else jumps into the same problem and finds time to dig it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands currently, it's not an appropriate test. Tests need to have a clear purpose; they can't just be a paste of "this is a piece of my project that I think should pass validation".

I can not look into that more closely and trace the root cause at the moment.

If this relates to the error reported in your original comment, the root cause is as the error says: you are attempting to use an integer as a JSON property name, which is in violation of the spec. You can't do that.

JSON has no concept of arrays that use arbitrary keys (i.e. associative arrays). You cannot legitimately use an array with a non-zero basis, or with index holes, because that is by definition an associative array. If you want to validate it, you'll need to turn it into a JSON-compliant object.

Copy link
Author

@aistis- aistis- Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands currently, it's not an appropriate test. Tests need to have a clear purpose; they can't just be a paste of "this is a piece of my project that I think should pass validation".

That's right.

you are attempting to use an integer as a JSON property name, which is in violation of the spec. You can't do that.

The JSON property is a numeric string. e.g. "4" or "2" in '{"some_property":{"some_sub_property":{"4":10,"2":10}}}'. To my knowledge it is a perfectly valid JSON. Is there a particular spec rule which states that a field cannot be made out of 0-9 characters only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A string containing a number is fine - that's still a string, which is what matters.

My comment about using integers as property names relates to your original post in this PR: using integers as property names is what you said that you are doing to trigger the error. I wasn't referring to the test content.

Copy link
Author

@aistis- aistis- Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought that integer keys of an array caused the issue in the domain model. However the model is normalized into an array before it is passed to JSON validator and the integer keys are already casted into regular strings. So at the end of the day, validator is not able to handle JSON fields which are numeric strings.

This was a dirty workaround, so I made a PR hoping me or someone else could come to this later.

I've tried to simplify the test input, but was unable to do it quickly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validator is not able to handle JSON fields which are numeric strings.

This seems extremely odd - but would definitely be a bug if it is the case. I will investigate that today.

Is there any particular situation which is required to trigger it, or is it simply any property with a numeric string as the name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a more simple example. I'll push it soon.

@erayd erayd added Not a bug issue-isn't-ours The issue isn't in this project; please contact the project that actually contains the problem. labels Feb 11, 2025
Copy link
Author

@aistis- aistis- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that casting into string is a workaround and it should be traced why string from the input was casted into integer in the first place 🤔

[
'{
"prop1": {
"prop2": "a"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test passes without a fix.

[
'{
"prop1": {
"123": "a"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test does not pass without a fix.

@aistis-
Copy link
Author

aistis- commented Feb 11, 2025

Turns out this is due to php design to silently cast keys which are numeric strings in array to integers.

Strings containing valid decimal ints, unless the number is preceded by a + sign, will be cast to the int type. E.g. the key "8" will actually be stored under 8. On the other hand "08" will not be cast, as it isn't a valid decimal integer.

php/php-src#9029
https://www.php.net/manual/en/language.types.array.php#language.types.array.syntax

In this case, maybe the fix/workaround actually makes sense?

@DannyvdSluijs
Copy link
Collaborator

I've been reading along but I must say my initial feel is the same as @erayd

If I take the input for the test that fails without the fix and just decode it there is no array.

$json = '{"prop1": {"123": "a"}}';

var_dump(json_decode($json));

// object(stdClass)#2 (1) {
//  ["prop1"]=>
//  object(stdClass)#1 (1) {
//    ["123"]=>
//    string(1) "a"
//  }
//}

So it think as mentioned earlier somewhere in code lies an issue where the json string is being decoded into an associative array (which does not exist in other programming languages besides PHP). The true fix should be there where the incorrect decoding takes place.

The PR at least proofs there is an issue we weren't aware of so that helps a lot. It is now a matter of debugging where the issue lies.

@aistis-
Copy link
Author

aistis- commented Feb 12, 2025

I've been reading along but I must say my initial feel is the same as @erayd

If I take the input for the test that fails without the fix and just decode it there is no array.

$json = '{"prop1": {"123": "a"}}';

var_dump(json_decode($json));

// object(stdClass)#2 (1) {
//  ["prop1"]=>
//  object(stdClass)#1 (1) {
//    ["123"]=>
//    string(1) "a"
//  }
//}

So it think as mentioned earlier somewhere in code lies an issue where the json string is being decoded into an associative array (which does not exist in other programming languages besides PHP). The true fix should be there where the incorrect decoding takes place.

The PR at least proofs there is an issue we weren't aware of so that helps a lot. It is now a matter of debugging where the issue lies.

Actually what happens is that PHP cannot have array with keys which are numeric strings. They are casted into integers. So:

$json = '{"prop1": {"123": "a"}}';

var_dump(json_decode($json, true));

array(1) {
  ["prop1"]=>
  array(1) {
    [123]=>
    string(1) "a"
  }
}

I think it makes sense to add a support for keys which are integer and being validated against object schema. The error occurs only when additionalProperties are used. It does not on a simple object where keys are integers.

@aistis-
Copy link
Author

aistis- commented Feb 17, 2025

Just to summary the issue:

Prior to v6.1.0 it was supported to validate array input despite the fact does the input contain array keys as integers or not. v6.1.0 does not support array keys as integers anymore and breaks with the error.

TypeError: property_exists(): Argument #2 ($property) must be of type string, int given

Why is it important to support validating array keys as integers? Because PHP natively casts numeric strings to integers when used in array keys. This is not client/user issue, but how PHP works by design. Therefore it should be supported to validate arrays with integer keys as using numbers as keys in a JSON is a legitimate case. Otherwise, passing input as array to the validator is not reliable and should be deprecated in favour of stdClass.

@erayd
Copy link
Contributor

erayd commented Feb 17, 2025

I think it makes sense to add a support for keys which are integer and being validated against object schema.

JSON does not support non-string keys for object properties. Such a key is in violation of the spec. As such, it's not something we should support (although throwing an error if we encounter one would make good sense, so that the user knows they need to fix their data).

The error occurs only when additionalProperties are used. It does not on a simple object where keys are integers.

Thanks for the context - this is useful 🙂

Because PHP natively casts numeric strings to integers when used in array keys. This is not client/user issue, but how PHP works by design. Therefore it should be supported to validate arrays with integer keys as using numbers as keys in a JSON is a legitimate case.

JSON does not support associative arrays. There's no such thing as specifying array keys at all, of any type. JSON arrays are simply a list of values, with no keys at all. A zero-based index is used to address individual items within the list.

How can attempting to validate an associative array be anything other than a user error?

Otherwise, passing input as array to the validator is not reliable and should be deprecated in favour of stdClass.

Can you clarify, please? Passing objects is what you're already supposed to be doing, if you need to access items using a string key.

@aistis-
Copy link
Author

aistis- commented Feb 17, 2025

JSON does not support non-string keys for object properties. Such a key is in violation of the spec. As such, it's not something we should support (although throwing an error if we encounter one would make good sense, so that the user knows they need to fix their data).

I am not sure why are you referencing non-string keys in a JSON. As I mentioned before and as the test in this PR is, the keys in JSON are strings.

JSON does not support associative arrays. There's no such thing as specifying array keys at all, of any type. JSON arrays are simply a list of values, with no keys at all. A zero-based index is used to address individual items within the list.

Again, the example is not associative array, but an object where property is a string.

How can attempting to validate an associative array be anything other than a user error?
Can you clarify, please? Passing objects is what you're already supposed to be doing, if you need to access items using a string key.

Associated array is being validated because that's what jsonrainbow/json-schema is promoting for using as normalized data input. That's even being tested in lib's tests.

^^ actually after double-checking, I don't see the lib promoting to use array as input data, but I see many tests do use it. Either the case, passing an array to the validator may cause the error as validation is broken with v6.1.0.

@erayd
Copy link
Contributor

erayd commented Feb 17, 2025

I am not sure why are you referencing non-string keys in a JSON.

Because you suggested that we support them (i.e. you suggested that we support int keys). I am telling you that is a spec violation, and we should therefore not support them.

Again, the example is not associative array, but an object where property is a string.

Yep, we established that earlier. If this library is converting object string keys into ints internally somewhere, that is a bug which needs fixing.

Associated array is being validated because that's what jsonrainbow/json-schema is promoting for using as normalized data input. That's even being tested in lib's tests.

Pretty sure we don't promote that anywhere, and the test you've linked looks like one of the "this should fail" tests (albeit I only glanced at it briefly).

If you did find some documentation saying that we support it, please point that out, because it will need correcting.

^^ actually after double-checking, I don't see the lib promoting to use array as input data, but I see many tests do use it.

Which of our tests use an associative array as input data? If they're positive tests, and do not contain a suitable pre-validation cast, then that is worthy of further investigation and potentially refactoring. Most of the tests are provided as JSON via the json-schema spec test suite.

Either the case, passing an array to the validator may cause the error as validation is broken with v6.1.0.

Don't pass an associative array to the validator. It's not supported, has never been supported, and is a spec violation anyway. If you managed to do it prior to v6.1.0, that just means that we weren't throwing an error on your invalid data in a situation where we probably should have been.

@aistis-
Copy link
Author

aistis- commented Feb 17, 2025

Because you suggested that we support them (i.e. you suggested that we support int keys). I am telling you that is a spec violation, and we should therefore not support them.

To clarify, I don't suggest to support integers in JSON input. I suggesting bringing back support of integers keys in validator when input is array.

Yep, we established that earlier. If this library is converting object string keys into ints internally somewhere, that is a bug which needs fixing.

I don't think the lib is converting string to ints. Ints are coming from user input. However, user cannot pass strings there as it is not possible by design in PHP.

Pretty sure we don't promote that anywhere, and the test you've linked looks like one of the "this should fail" tests (albeit I only glanced at it briefly).

Sorry, I pasted incorrect link. This one is correct.

Which of our tests use an associative array as input data? If they're positive tests, and do not contain a suitable pre-validation cast, then that is worthy of further investigation and potentially refactoring. Most of the tests are provided as JSON via the json-schema spec test suite.

$value = json_decode($input, true);

$data = json_decode('{"propertyOne":[42]}', true);

$input = json_decode($input, true);

Don't pass an associative array to the validator. It's not supported, has never been supported, and is a spec violation anyway. If you managed to do it prior to v6.1.0, that just means that we weren't throwing an error on your invalid data in a situation where we probably should have been.

Again, you are talking about 2 different things. JSON (single string) is not passed into a validator, but either stdClass or array is. It's fine that you decide to drop input as array, but this should be highlighted in the docs then.

[Edit by @erayd: My apologies; I accidentally edited your post on my phone instead of quote-replying. I have reinstated your post as you originally provided it, and sorry for the confusion!]

@erayd
Copy link
Contributor

erayd commented Feb 17, 2025

To clarify, I don't suggest to support integers in JSON input. I am suggesting bringing back support of integers keys in validator when input is array.

JSON arrays do not have keys. Providing an associative array as input data for validation is therefore a spec violation. It's not a case of "bringing back support", but rather, it's something that should throw an error if you attempt it.

I don't think the lib is converting string to ints. Ints are coming from user input. However, user cannot pass strings there as it is not possible by design in PHP.

The comment you link seems to be taking about associative arrays. As I have stated several times now, these are a spec violation; you cannot safely use them as input. Avoid them. Provide input that is spec-compliant.

If you are referring to a problem with actual, spec-compliant objects, then can you please provide an example of what you mean? Because when I test using string keys on an object, they work precisely as expected:

<?php
$a = new StdClass();
$a->{'an explicit string key'} = 'test';
$a->{'5'} = 'five';
var_dump($a);
object(stdClass)#1 (2) {
  ["an explicit string key"]=>
  string(4) "test"
  ["5"]=>
  string(4) "five"
}

Sorry, I pasted incorrect link. This one is correct.

That one is a type-casting test. It's only applicable when using the type-casting feature; it isn't a normal test mode. I'd need to check the code to be certain (it's been a number of years since I looked closely at that feature), but from memory it will attempt an internal cast-to-object if it sees something that looks like an assoc array.

Which of our tests use an associative array as input data? If they're positive tests, and do not contain a suitable pre-validation cast, then that is worthy of further investigation and potentially refactoring. Most of the tests are provided as JSON via the json-schema spec test suite.

$value = json_decode($input, true);

See my comment above; this is a type-casting test.

$data = json_decode('{"propertyOne":[42]}', true);

This is a test that is explicitly designed to fail. It's a problem if it passes.

$input = json_decode($input, true);

This is another type-casting test. As above, using that feature should result in an internal cast occurring.

Again, you are talking about 2 different things. JSON (single string) is not passed into a validator, but either stdClass or array is. It's fine that you decide to drop input as array, but this should be highlighted in the docs then.

You're missing the point entirely. The validator expects to see spec-compliant input. That means no assoc arrays, no references, no resource types, etc. The types and structure that you provide it must comply with the JSON spec. An assoc array does not comply, and you should not pass it here without casting.

@aistis-
Copy link
Author

aistis- commented Feb 17, 2025

Provide input that is spec-compliant.

From my point of view, assoc array is just data structure to transport JSON. I am pointing out that this data structure isn't fully supported. You are keep mentioning compliance to JSON spec. Isn't these are 2 different things? I would expect the validator to accept string as the value only, but it open to pass any type of value like object, stdClass, array. I find hard to understand how the data structure is related to the spec itself if it's just a matter how the validator is going to iterate internally over the structure.

This is a simple example which shows that clients should not use array as a data structure to pass for validator.

That means no assoc arrays

Does it make sense to highlight in the docs that passing an array to the validator isn't expected?

@erayd
Copy link
Contributor

erayd commented Feb 17, 2025

From my point of view, assoc array is just data structure to transport JSON. I am pointing out that this data structure isn't fully supported.

Of course it's not supported - that data structure is a spec violation.

PHP's json_decode() function does support decoding a JSON string into an assoc array if you explicitly set the second argument to true, so perhaps that is where the confusion is? It isn't the default decode method however, and it results in ambiguous output (because in that mode, it's not possible to definitively tell the difference between an array and an object, as PHP insists on assigning index keys when decoding). This also means it's not reversible. Decoding it this way will cause problems, and should be avoided.

You are keep mentioning compliance to JSON spec. Isn't these are 2 different things?

Nope. The validator expects that its input should be valid JSON. For performance reasons, it accepts that input in decoded form (i.e. stdClass objects, non-assoc arrays, and scalars - this is what json_decode() will output by default). That's why you don't need to pass it an encoded string. But it must still be spec-compliant. You can build that structure manually, if you like - and some apps do. But if you build it manually, it's important to make sure you don't include spec violations in it (e.g. things like file handles are clearly not compliant).

I would expect the validator to accept string as the value only, but it open to pass any type of value like object, stdClass, array...

While it's technically possible to pass any value type, you aren't supposed to do this. If you pass it things that don't comply with the spec, expect to get weird results.

I find hard to understand how the data structure is related to the spec itself if it's just a matter how the validator is going to iterate internally over the structure.

It's not just about how it's iterated. The whole point of a JSON validator is to validate JSON. It isn't designed to validate things that are not JSON. When you provide a value as input to the validator, it expects that value to be valid (albeit pre-decoded) JSON.

This is a simple example which shows that clients should not use array as a data structure to pass for validator.

That is not a simple example. Can you provide a one-liner please? Arrays (plain ones, zero-indexed with no holes) are fine. Associative arrays (which are essentially dictionaries with a multi-type key) are not.

Does it make sense to highlight in the docs that passing an [assoc] array to the validator isn't expected?

Yes. I actually thought it already was somewhere, but if we don't mention that, then it absolutely needs adding to the docs. This is not the first time that someone has gotten tangled up by trying to use assoc arrays where they have no business being.

More generally, the problem of incorrect types being provided by the user is why the type casting / coercion features exist. They are a quick & dirty way to "just make it go". But they cannot by the very nature of the types involved ever be guaranteed to make the correct assumptions in all cases, so they are best avoided unless you know precisely what the implications will be.

It doesn't help that this library has a number of blind spots, and some of those can be triggered by providing data in the wrong type. Most of them have been fixed over the years, but there are still a few lurking around waiting for somebody to trip over 😞.

The documentation generally needs a massive birthday. It's not nearly as comprehensive as IMO it should be.

@aistis-
Copy link
Author

aistis- commented Feb 17, 2025

Thanks, that sheds the light on how the lib should be used 🙇

... (because in that mode, it's not possibly to definitively tell the difference between an array and an object, as PHP insists on assigning index keys when decoding)

That makes sense. I guess we should rework our objects serialization before the validator usage in our app.

@DannyvdSluijs
Copy link
Collaborator

@aistis- am I correct in assuming you and @erayd now have a shared understanding about why we dont want to support integer keys and that the problem lies in the library not providing any feedback about being misused and a general lack of documentation?

If that is true I would opt to close this PR and open two new issues for the problems summarized above.

@aistis-
Copy link
Author

aistis- commented Feb 17, 2025

As I understand, passing assoc array isn't supported even it worked in the past, have some tests and available CHECK_MODE_TYPE_CAST feature. I'd expect highlighting it and enforcing prevention of validator misusage. Either via strict type or feedback (e.g. exception) on unexpected input.

I would probably keep the backwards compatibility and merge the fix to keep upgrades easy.

@DannyvdSluijs
Copy link
Collaborator

DannyvdSluijs commented Feb 20, 2025

Looking more detailed into this in code I can find small slithers of code or docs that refer to associative arrays

// fileName: README.md
| `Constraint::CHECK_MODE_TYPE_CAST` | Enable fuzzy type checking for associative arrays and objects |

In the test I can see test called testInvalidCasesUsingAssoc and testValidCasesUsingAssoc where we do a json_decode($input, true) decode with associative: true

Trying to look at the PR from the reporter side, @aistis- took the time to report a bug he ran into since the 6.1.0 release. Although the input provided makes for some weird JSON and PHP does type juggling between JSON and associative arrays in both ways I now believe we should accept the PR, with a side note we are doing this to fix the regression introduced in 6.1.0 and we are still not 100% onboard with supporting associative arrays.

The next mayor version might be where we drop associative arrays al together, I have opened up a discussion on that topic.

@erayd I would like to give you the opportunity to have a final say in this matter before I decide to merge the PR.


For future reference:

// JSON Object with string property containing only numbers in associative array does type juggling from string to integer
var_dump(json_decode('{"123": "a"}', true));

// array(1) {
//  [123]=>
//  string(1) "a"
// }

// Associative array declaration with string key having only numbers does type juggling from string to integer
var_dump(['123' => 'a']);

// array(1) {
//  [123]=>
//  string(1) "a"
// }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue-isn't-ours The issue isn't in this project; please contact the project that actually contains the problem. Not a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants