Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Allow unsetting of backed property hooks #17922

Open
acabal opened this issue Feb 25, 2025 · 10 comments
Open

Feature request: Allow unsetting of backed property hooks #17922

acabal opened this issue Feb 25, 2025 · 10 comments

Comments

@acabal
Copy link

acabal commented Feb 25, 2025

Description

(I hope this is the right place to suggest this, if not I'm happy to move it elsewhere.)

In the new property hooks feature of PHP 8.4, I think it would be very useful to be able to unset() backed properties.

Consider the following object, a forums post with a user object representing the poster which is lazy-loaded with a getter:

<?php
class ForumsPost{
    public int $UserId;
    public User $User{
        get{
            if(!isset($this->User)){
                $this->User = Db::GetUser($this->UserId);
            }

            return $this->User;
        }
    }
}

$fp = new ForumsPost();
$fp->UserId = 1;
print($fp->User->UserId); // `User` is retrieved from the database and `1` is printed.

Now suppose we have already retrieved the User property, and later we change the UserId. Now the UserId property and the User property are now out of sync. What if we could add a setter hook for the UserId property, to unset() any cached User object at the moment when we set the UserId? For example:

<?php
class ForumsPost{
    public int $UserId{
        set(int $value){
            // Set the new `UserId`...
            $this->UserId = $value;

            // And unset any cached `User` we may have fetched earlier...
            unset($this->User); // Error! Not allowed in PHP 8.4, but natural to think about and very convenient!
        }
    }

    public User $User{
        get{
            if(!isset($this->User)){
                $this->User = Db::GetUser($this->UserId);
            }

            return $this->User;
        }
    }
}

$fp = new ForumsPost();
$fp->UserId = 1;
print($fp->User->UserId); // `User` is retrieved from the database and `1` is printed.

$fp->UserId = 2;
print($fp->User->UserId); // Expected to print `2`, but actually is an error because we cannot `unset()` the backed `User` property.

The current state of affairs means that if we want to return a backed property to a "not initialized" state, we have to use kludges like making it a nullable type and calling null "not initialized" by convention, or setting it to say an empty string and calling that "not initialized" by convention. Both of these are poor, semantically wrong solutions, especially considering it's already allowed to unset() plain object properties directly.

Additionally, the consumer of an object might find the following behavior surprising:

<?php
$fp = new ForumsPost();
unset($fp->UserId); // Success!
unset($fp->User); // Error! Huh, why? (`$User` is a property hook...)

The consumer of an object should not need to know the internal implementation of properties. In this example it is well-established in PHP that we can unset() object properties, but since plain properties and property hooks are indistinguishable to the consumer, unset() on a property can result in an error if the internal implementation is a property hook. This is surprising to the consumer, who doesn't and shouldn't care how the property is implemented internally.

In the RFC for property hooks, unset() is explicitly disallowed because "it would involve bypassing any set hook that is defined". But I'm not sure why unset()ing something would invoke the setter.

@devnexen
Copy link
Member

cc @iluuu1994

@mvorisek
Copy link
Contributor

related with #9389, there should be ReflectionProperty::uninitialize() method

our usecase is https://github.com/atk4/core/blob/5.2.0/src/Phpunit/TestCase.php#L115 - we want to reuse some objects, ie. reset them to initial state, currently, it is not possible

@iluuu1994
Copy link
Member

In the RFC for property hooks, unset() is explicitly disallowed because "it would involve bypassing any set hook that is defined". But I'm not sure why unset()ing something would invoke the setter.

set is a way to observe changes to a property. unset is just another way to do that, just with the internal IS_UNDEF value. Our RFC tries to guarantee that if a set hook is present, indirect forms of modification of the property value are prevented.

Also note that unset() also has other weird semantics. Most significantly, unset() properties will fall back to __get and __set when accessed, and it's unclear to me how exactly this would interact with hooks.

related with #9389, there should be ReflectionProperty::uninitialize() method

I think that would be reasonable, although the magic method question above remains.

@acabal
Copy link
Author

acabal commented Feb 25, 2025

Also note that unset() also has other weird semantics. Most significantly, unset() properties will fall back to __get and __set when accessed, and it's unclear to me how exactly this would interact with hooks.

I don't have any insight into PHP internals, but as a developer if I unset() a property hook I would expect its backing value to revert to "unitialized", and subsequent accesses to pass through the getters and setters just like at first - not through __get. The latter behavior would be very unexpected.

Another issue in favor of unset() that just came up in our real-world usage: if there is complex logic in a getter that caches values, without unset() that logic can never be accessed again. Consider this example:

<?php
class ForumsPost{
	public string $Contents;
	public int $WordCount{
		get{
			if(!isset($this->WordCount)){
				$this->WordCount = /* Some complex, expensive logic working on `ForumsPost::$Contents`... */;
			}

			return $this->WordCount;
		}
	}
}

$fp = new ForumsPost();
$fp->Contents = 'Foo';
print($fp->WordCount); // Prints `1`.

$fp->Contents = 'Foo Bar';
print($fp->WordCount); // Wrong, still prints `1`. We have to unset `ForumsPost::$WordCount` to recalculate it...

unset($fp->WordCount); // Error - not allowed in PHP 8.4!

print($fp->WordCount); // Would print `2` if `unset()` were allowed.

In this example, the logic to calculate and cache the word count is in the getter for ForumsPost::$WordCount. The getter caches the value because the calculation is expensive. But since we can't unset backed property hooks, we can never recalculate the word count again!

The workaround is to put the word count logic in a separate public method accessible to both the getter and other callers, just like we would have done if property hooks didn't exist at all. But doesn't that kind of defeat the point of property hooks? Now we've added extra methods merely to work around a language limitation, and the caller must be aware enough to determine when a recalculation is desirable and remember to call those methods in those cases. We're in a position not much different than if property hooks didn't exist at all.

IMHO much of the benefit of property hooks is being able to connect logic to the getting of and changing of properties, so that the gory details of their calculation are hidden from the caller. If we have to resort to exposing extra public methods because we can't unset() backed property hooks, much of the value of property hooks goes away.

@iluuu1994
Copy link
Member

iluuu1994 commented Feb 25, 2025

I don't have any insight into PHP internals, but as a developer if I unset() a property hook I would expect its backing value to revert to "unitialized", and subsequent accesses to pass through the getters and setters just like at first - not through __get. The latter behavior would be very unexpected.

This is the default behavior for properties. https://3v4l.org/W3SAZ Since backing properties are really just properties and follow the same rules, this would also apply there. But as you say, most people would not expect that. It's mostly problematic for set, where it would become impossible to assign the value again without reflection.

We also need to differentiate between unset() inside hooks, and unset() outside hooks. They would have different purposes, but your examples seem to be mixing them. The former is essentially just an extended API (an unset hook), while the latter would be used only inside hooks for things like cache invalidation. And I feel strongly that this API should be opt-in, rather than the default.

The workaround is to put the word count logic in a separate public method accessible to both the getter and other callers, just like we would have done if property hooks didn't exist at all. But doesn't that kind of defeat the point of property hooks?

You can do:

class C {
    private $_prop;
    public $prop {
        get => $this->_prop ??= <initialize>;
    }
    public function invalidateCache() {
        unset($this->_prop);
    }
}

Yes, this requires a separate method for cache invalidation (not for accessing), but it's certainly a cleaner API than undocumented unset() calls that can break if the code is refactored (e.g. the property is turned into a virtual one).

@acabal
Copy link
Author

acabal commented Feb 25, 2025

We also need to differentiate between unset() inside hooks, and unset() outside hooks. They would have different purposes, but your examples seem to be mixing them. The former is essentially just an extended API (an unset hook), while the latter would be used only inside hooks for things like cache invalidation. And I feel strongly that this API should be opt-in, rather than the default.

Not sure why that would be. From my perspective as a user of PHP and not a maintaner, if I unset() an object property, I want it returned to an "uninitialized" state in all cases, regardless if it's called inside or outside of a property hook. However it might indeed be useful to have an unsetter property hook in the same way we have __unset(). But if we were to add that, then we might as well complete the list and add an issetter property hook. (Though I think that's beyond the scope of this particular request, and isset()'s tragic confusion of uninitialized and null always made __isset() feel of limited usefulness in a situation where one might benefit the most from property hooks, like an ORM.)

Yes, this requires a separate method for cache invalidation (not for accessing), but it's certainly a cleaner API than undocumented unset() calls that can break if the code is refactored (e.g. the property is turned into a virtual one).

Indeed. But in my view if we're forced to add methods to operate on property hook internals - which ideally should be totally hidden from the caller - we may as well not use property hooks at all because we're no better off than the pre-8.4 state of things. In fact we might be worse off, because now we're mixing new-style property hooks with public methods that operate on logic hidden in property hooks, which I think is much more confusing. The caller now requires intimate knowledge of object internals which by all rights should be hidden, and the attention to detail to use those methods consistently every time. That will result in bugs and confusing code.

In fact I would argue that in the case of cacheable property hooks, it would be clearer to simply not use property hooks in their current state at all because of this issue. But that's not what I want - I like property hooks! (I was a C# developer decades ago where I grew to like what they called "accessors".)

Yes, this requires a separate method for cache invalidation (not for accessing), but it's certainly a cleaner API than undocumented unset() calls that can break if the code is refactored (e.g. the property is turned into a virtual one).

I don't think cost to refactor should be a major consideration. Refactoring will always be hard, but the cost of an incomplete implementation of property hooks is much higher - it can result in them not being used at all, like we've decided to do!

@iluuu1994
Copy link
Member

I don't think we'll reach consensus here. I would encourage you to start a discussion on the mailing list (GH is used for bugs or simple feature requests) if you feel this should be pursued. In any case, it will require an RFC.

I actually think unset() is a very bad design choice in general. It is not type safe (because it introduces an implicit "undefined" state into the type union), and it's almost always better to be explicit. To be fair, in this case the state doesn't leak the get hook but the caller still needs to know that calling unset() and then immediately accessing the property again is safe and won't trigger an "uninitialized property" error.

But in my view if we're forced to add methods to operate on property hook internals - which ideally should be totally hidden from the caller - we may as well not use property hooks at all

To be clear, it was never our intention to cover 100% of the use-cases. Lazy properties that require cache invalidation go into the niche category in my book, and while you can still solve this with property hooks, you'll need a separate cache property.

I don't think cost to refactor should be a major consideration.

It's about breaking APIs, rather than refactoring. unset() leaks implementation details, at least without an explicit unset hook.

@acabal
Copy link
Author

acabal commented Feb 25, 2025

I actually think unset() is a very bad design choice in general.

Perhaps, but it's what we have and it's not going anywhere. I think isset() is also tragically flawed because it confuses uninitialized with null - but to my eternal frustration nobody's changing that either.

Lazy properties that require cache invalidation go into the niche category in my book,

I would push back on this a little - lazy loading and caching of objects is very common in ORMs and is very useful when converting DB rows into complex objects that don't need every single sub-object retrieved from the DB at once. I would not consider that a niche use case. In fact I would call object retrieval and caching one of the primary use cases of property hooks.

IMHO PHP isn't going to be making any drastic changes like removing uninitialized or unset(), and in fact unset() is very useful given PHP's current state of evolution, whether we like it or not; so when extending the language we should embrace useful warts like unset() instead of rejecting them as ideologically impure. Doing so results in a language that is inconsistent with itself. PHP's flexibility with types and objects is part of what made it so popular in the first place. (And also what makes it sometimes frustrating and easy to write bad code in!)

From my end, we will not be using property hooks at all because of this issue, which is a great pity because I love the idea and want to use it!

@iluuu1994
Copy link
Member

Perhaps, but it's what we have and it's not going anywhere. I think isset() is also tragically flawed because it confuses uninitialized with null - but to my eternal frustration nobody's changing that either.

As mentioned, IMO at least, the main issue is that properties should not be uninitialized at all. If some properties remain uninitialized after constructions, there's no way they'll know through the API without reading your code. If "uninitialized" is a possibility that they need to handle, their properties should be nullable, or some other union type.

In fact I would call object retrieval and caching one of the primary use cases of property hooks.

Lazy loading doesn't pose a problem though. It's only cache invalidation from outside the property. But IMO this leaks implementation details.

From my end, we will not be using property hooks at all because of this issue, which is a great pity because I love the idea and want to use it!

I don't quite follow this reasoning. C# doesn't offset unset. Actually, you would use null for this case, which you can do with property hooks. The downside is that you cannot hint the backing property to be nullable while making the getter not nullable, I'll admit. That said, in C# you'll get the implicit nullability for every reference type, which is much worse.

Anyway, I think I'll leave it here with the discussion, since it doesn't look like we'll be on the same page regarding this issue. 🙂 That's why I suggested consulting the mailing list. Maybe other people have other view points.

@acabal
Copy link
Author

acabal commented Feb 25, 2025

I don't quite follow this reasoning. C# doesn't offset unset. Actually, you would use null for this case, which you can do with property hooks. The downside is that you cannot hint the backing property to be nullable while making the getter not nullable, I'll admit. That said, in C# you'll get the implicit nullability for every reference type, which is much worse.

We can't use null to mean uninitialized, because null is a valid initialized value that can come from the DB, and is a distinct concept from uninitialized. (See my earlier complaint - and complaints going back decades - about isset().) This is common pattern in real-world data whether we like it or not. Additionally that would mean that every property would have to be declared as nullable, which may not be actually true; but PHP has uninitialized already sitting right there, as part of the language, in common use!

Our reasoning for rejecting property hooks is because if we were to switch our codebase to property hooks, it should be an all-in update, not one that uses hooks for some things but not for others because of this (or any other) limitation that leaves the language inconsistent with itself. I created this issue because this problem actually happened in our real-world codebase as we were exploring property hooks. A codebase using a feature some times but not others will be inconsistent and much more confusing to read. Consistency is more important than cool features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants