Skip to content

Cmatrix1/Python-Descriptors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

safjsd

Python Descriptors

Complete Tuturial of Python Descriptors

Introduction

Welcome to the intriguing world of Python descriptors! This article may seem a bit complex, but trust me, understanding descriptors is absolutely worth it.

They unlock the inner workings of properties, methods, slots, and even functions, providing a solid foundation for Python mastery.

In this section, we'll primarily focus on two types of descriptors: non-data descriptors and data descriptors. These two types behave slightly differently, and we'll delve into their distinctions.

As we progress, we'll learn how to craft and utilize custom data descriptors. You'll discover the unique advantages and practical applications of writing your own data descriptors.

Additionally, we'll navigate through some common pitfalls related to storing data associated with data descriptors. By learning how to avoid these pitfalls, you'll enhance your descriptor implementation skills.

As we explore deeper into this section, we'll tackle advanced topics such as weak references and their relationship to weak dictionaries. These concepts will broaden your understanding and empower you to create more robust and efficient code. 😄 There's no denying that this article is packed with captivating content. So, without further ado, let's dive in and unravel the mysteries of Python descriptors together! Thank you for joining me.

Problem We Want to Solve

Suppose we want a User class whose age must always be integer and name must always be string

Of course we can use a property with getter and setter methods

Let's implement it with propertys first:

class User:
    @property
    def name(self):
        return self._name
    
    @name.setter
    def name(self, value):
        self._name = str(value)

    @property
    def age(self):
        return self._age
    
    @age.setter
    def age(self, value):
        self._age = int(value)

this is tedious, repetitive boiler plate code! better way needed!

Descriptors

And this is where the descriptor protocol comes in !

Python descriptors are simply objects that implement the descriptor protocol.

The protocol is comprised of the following special methods - not all are required.

  • __get__: retrieve the property value
  • __set__: store the property value
  • __del__: delete a property from the instance
  • __set_name__: capture the property name as it is being defined.

There are two types of Descriptors:

  1. Non-data descriptors: these are descriptors that only implement __get__ (and optionally __set_name__)
  2. Data descriptors: these implement the __set__ method, and normally, also the __get__ method.

Let's create a simple non-data descriptor:

from datetime import datetime

class TimeUTC:
    def __get__(self, instance, owner_class):
        return datetime.utcnow().isoformat()

So TimeUTC is a class that implements the __get__ method only, and is therefore considered a non-data descriptor.

We can now use it to create properties in other classes:

class Logger:
    current_time = TimeUTC()

NOTE : That current_time is a class attribute:

Logger.__dict__
mappingproxy({'__module__': '__main__',
              'current_time': <__main__.TimeUTC at 0x7fdcd84bbd68>,
              '__dict__': <attribute '__dict__' of 'Logger' objects>,
              '__weakref__': <attribute '__weakref__' of 'Logger' objects>,
              '__doc__': None})

We can access that attribute from an instance of the Logger class:

l = Logger()
l.current_time
'2023-10-27T08:11:58.319429'

We can also access it from the class itself, and for now it behaves the same (we'll come back to that later):

Logger.current_time
'2023-10-27T08:11:58.327000'

Let's consider another example.

Suppose we want to create class that allows us to select a random status and random favorite color for the user instance. We could approach it this way:

from random import choice, seed

class Person:
    @property
    def status(self):
        return choice(('😃', '😄', '😊', '😉', '😍', '🤩', '😎', '🥳', '😇', '🙌'))
        
    @property
    def favorite_color(self):
        colors = ('🔴', '🟠', '🟡', '🟢', '🔵', '🟣', '🟤', '⚫', '⚪', '🌈')
        return choice(colors)

This was pretty easy, but as you can see both properties essentially did the same thing - they picked a random choice from some iterable. Let's rewrite this using a custom descriptor:

class Choice:
    def __init__(self, *choices):
        self.choices = choices
        
    def __get__(self, instance, owner_class):
        return choice(self.choices)

And now we can rewrite our Person class this way:

class Person:
    status = Choice('😃', '😄', '😊', '😉', '😍', '🤩', '😎', '🥳', '😇', '🙌')
    favorite_color = Choice('🔴', '🟠', '🟡', '🟢', '🔵', '🟣', '🟤', '⚫', '⚪', '🌈')

Of course we are not limited to just cards, we could use it in other classes too:

class Dice:
    die_1 = Choice(1,2,3,4,5,6)
    die_2 = Choice(1,2,3,4,5,6)
    die_3 = Choice(1,2,3,4,5,6)

Getters and Setters

Getters

So far we have seen how the __get__ method is called when we assign an instance of a descriptors to a class attribute. But we can access that attribute either from the class itself, or the instance - as we saw in the last lecture, both accesses end up calling the __get__ method. So, when get is called, we may want to know:

  • which instance was used (if any) -> None if called from class
  • what class owns the TimeUTC (descriptor) instance -> Logger in our case

this is why we have the signature: __get__(self, instance, owner_class) So we can return different values from get depending on:

  • called from class
  • called from instance
from datetime import datetime

class TimeUTC:
    def __get__(self, instance, owner_class):
        print(f'__get__ called, self={self}, instance={instance}, owner_class={owner_class}')
        return datetime.utcnow().isoformat()

class Logger1:
    current_time = TimeUTC()

Now let's access current_time from the class itself:

Logger1.current_time
'__get__ called, self=<__main__.TimeUTC object at 0x0000026BF3C87210>, instance=None, owner_class=<class '__main__.Logger1'>'
'2023-10-27T09:58:00.586792'

As you can see, the instance was None - this was because we called the descriptor from the Logger1 class, not an instance of it. The owner_class tells us this descriptor instance is defined in the Logger1 class.

But if we call the descriptor via an instance instead:

l1 = Logger1()
l1.current_time
'__get__ called, self=<__main__.TimeUTC object at 0x0000026BF3C87210>, instance=<__main__.Logger1 object at 0x0000026BF41DEC50>, owner_class=<class '__main__.Logger1'>'

very often, we choose to:

  • return the descriptor (TimeUTC) instance when called from class itself (Logger class) gives us an easy handle to the descriptor instance
  • return the attribute value when called from an instance of the class
from datetime import datetime

class TimeUTC:
    def __get__(self, instance, owner_class):
        if instance is None:
            # called from class
            return self
        else:
            # called from instance
            return datetime.utcnow().isoformat()

class Logger:
    current_time = TimeUTC()

Logger.current_time
'<__main__.TimeUTC at 0x26bf41df350>'

l = Logger()
l.current_time
'2023-10-27T09:58:00.644233'

This is consistent with the way properties work:

class Logger:
    @property
    def current_time(self):
        return datetime.utcnow().isoformat()

Logger.current_time
'<property at 0x26bf420c950>'

This returned the property instance, whereas calling it from an instance:

l = Logger()
l.current_time
'2023-10-27T09:58:00.661615'

Now, there is one subtle point we have to understand when we create multiple instances of a class that uses a descriptor as a class attribute.

Since the descriptor is assigned to an class attribute, all instances of the class will share the same descriptor instance!

class TimeUTC:
    def __get__(self, instance, owner_class):
        if instance is None:
            # called from class
            return self
        else:
            # called from instance
            print(f'__get__ called in {self}')
            return datetime.utcnow().isoformat()
        
class Logger:
    current_time = TimeUTC()

l1 = Logger()
l2 = Logger()

But look at the current_time for each of those instances

l1.current_time is l2.current_time
> True

As you can see the same instance of TimeUTC was used. This does not matter in this particular example, since we just return the current time, but watch what happens if our property relies on some kind of state in the descriptor:

class Countdown:
    def __init__(self, start):
        self.start = start + 1
        
    def __get__(self, instance, owner):
        if instance is None:
            return self
        else:
            self.start -= 1
            return self.start

class Rocket:
    countdown = Countdown(10)

Now let's say we want to launch two rockets:

rocket1 = Rocket()
rocket2 = Rocket()

And let's start the countdown for each one:

rocket1.countdown
> 10
rocket2.countdown
> 9
rocket1.countdown
> 8

As you can see, the current countdown value is shared by both rocket1 and rocket2 instances of Rocket - this is because the Countdown instance is a class attribute of Rocket. So we have to be careful how we deal with instance level state.

Setters

The __set__ method works in a similar way to __get__ but it is used when we assign a value to the class attribute. __set__ signature is as follows: self, instance, value

  • instance: the instance the set method was called from
  • value: the value we want to assign to the attribute

You'll notice there is no owner_class like we have in the __get__ method setters (and deleters) are always called from instances

class IntegerValue:
    def __set__(self, instance, value):
        print(f'__set__ called, instance={instance}, value={value}')
        
    def __get__(self, instance, owner_class):
        if instance is None:
            print('__get__ called from class')
        else:
            print(f'__get__ called, instance={instance}, owner_class={owner_class}')

class Point2D:
    x = IntegerValue()
    y = IntegerValue()

p = Point2D()
p.x = 100
> '__set__ called, instance=<__main__.Point2D object at 0x000001E7564084D0>, value=100'

Caveat with Set and Delete (and Get)

Notice that we have only created a single instance of the TimeUTC descriptor:

class Logger:
current_time = TimeUTC()

So what happens when we do this?

l1 = Logger()
l2 = Logger()

Any instance of Logger will be referencing the same instance of TimeUTC in this case it does not matter because __get__ just returns the current UTC time But what happens when we have to "store" and "retrieve" data from the instances? Suppose IntegerValue is a data descriptor -> implements __get__ and __set__ methods

class Point2D:
    x = IntegerValue()
    y = IntegerValue()
# two separate instances of IntegerValue assigned to the class attributes x and y

p1 = Point2D()
p2 = Point2D()
# two separate instances of Point2D

But what object does p1.x reference? -> the class attribute x what about p2.x? -> the same class attribute x (the same instance of IntegerValue)

we have to be mindful of which instance we are "storing" the data for this is one of the reasons both get and set need to know the instance

Storing

So, where should we store the values x and y in the previous example?

Approach 1

Many "tutorials" I see on the web naively store the value in the descriptor itself:

class IntegerValue:
    def __set__(self, instance, value):
        self._value = int(value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            return self._value

class Point2D:
    x = IntegerValue()
    y = IntegerValue()

At first blush, this seems to work just fine:

p1 = Point2D()
p2 = Point2D()

p1.x = 1.1
p1.y = 2.2

print(p1.x, p1.y)
> (1, 2)

But, remember the point I was making about the instance of the descriptor (IntegeraValue in this case) being shared by all instances of the class (Point2D in this case)?

p2 = Point2D()
print(p2.x, p2.y)
> (1, 2)

And of course if we set the value:

p2.x = 100.9
print(p2.x, p1.x)

> (100, 100)

So, obviously using the descriptor instance dictionary for storage at the instance level is probably not going to work in most cases! And this is the reason both the __get__ and __set__ methods need to know which instance we are dealing with.

Approach 2

We are going to Assuming that the instance is a hashable object, and use a dictionary in the descriptor to store instance specific values.

class IntegerValue:
    def __init__(self):
        self.values = {}
        
    def __set__(self, instance, value):
        self.values[instance] = int(value)
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            return self.values.get(instance)
class Point2D:
    x = IntegerValue()
    y = IntegerValue()

p1 = Point2D()

p1.x = 10.1
p1.y = 20.2

p1.x, p1.y
> (10, 20)

In fact, we can see the dictionary in the descriptor instances:

Point2D.x.values
> {<__main__.Point2D at 0x212d93d9710>: 10}
Point2D.y.values
> {<__main__.Point2D at 0x212d93c2110>: 20}

where the key in both of these is our p1 object We can now create a second point, and go through the same steps:

p2 = Point2D()
p2.x = 100.1
p2.y = 200.2

Point2D.x.values
> {<__main__.Point2D at 0x212d93b2290>: 10,
> <__main__.Point2D at 0x212d931c8d0>: 100}

Point2D.y.values
> {<__main__.Point2D at 0x212d93b2290>: 20,
> <__main__.Point2D at 0x212d931c8d0>: 200}

And everything works just fine ( Or does it?? ):

p1.x, p1.y, p2.x, p2.y
> (10, 20, 100, 200)

We actually have a potential memory leak - notice how the dictionary in the desccriptor instance is also storing a reference to the point object - as a key in the dictionary.


Let's write a simple utility function that allows us to get the reference count for an object given it's id (and it only makes sense if the id we use still has a valid non-destroyed object):

import ctypes

def ref_count(address):
    return ctypes.c_long.from_address(address).value

p1 = Point2D()
id_p1 = id(p1)

ref_count(id_p1)
> 1

Now let's set the x property of p1:

p1.x = 100.1

And let's check the ref count again:

ref_count(id_p1)
> 2

As you can see it's now 2. if we delete our main reference to p1 that is in our global namespace:

'p1' in globals()
> True
del p1
'p1' in globals()
> False
ref_count(id_p1)
> 1

And our reference count is still 1, which means the object itself has not been destroyed! In fact, we can see that object referenced in our data descriptor dictionary:

Point2D.x.values.items()
> dict_items([(<__main__.Point2D object at 0x00000212D93B2290>, 10), (<__main__.Point2D object at 0x00000212D931C8D0>, 100), (<__main__.Point2D object at 0x00000212D9390150>, 100)])

As you can see, the last element's key is the same id as what p1 was referencing. So, although we deleted p1, the object was not destroyed - this can result in a memory leak. There are a few ways we can handle this issue. The first one we are going to look at is something called weak references . So let's segway into that next.

Approach 3

First let's bring back the function we can use to determine the reference count of an object by id:

import ctypes

def ref_count(address):
    return ctypes.c_long.from_address(address).value

Note that this counts the strong references to that object.

class Person:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return f'Person(name={self.name})'

p1 = Person('Guido')
p2 = p1

p1_id == p2_id, ref_count(p1_id)
> (True, 2)

So we have two strong references. If we delete one of them:

del p2
ref_count(p1_id)
> 1

We have a strong reference count of 1 now

We can delete the last reference:

del p1

Now our reference count function will not work anymore, since the last reference to the object at that mempry address was removed and that memory address is now meaningless:

ref_count(p1_id)
> -370994432650002694

The garbage collector will destroy any object whose strong reference count goes down to 0. So far, we have always worked with strong references But There is another type of reference to an object that we can use that does not affect the (strong) reference count - these are called weak references. WeakRef Doc

For our data descriptors, we want to use the instance objects as keys in our dictionary. But as we saw earlier, storing the object itself as the key can lead to memory leaks. So instead, we are going to store weak references to the object in the dictionary. We could use our own dictionary, but weakref also provides a specialized dictionary type, that will store a weak reference to the object being used as the key:

p1 = Person('Guido')
d = weakref.WeakKeyDictionary()

ref_count(id(p1))
> 1

weakref.getweakrefcount(p1)
> 0

d[p1] = 'Guido'

Now, notice the reference counts:

ref_count(id(p1)), weakref.getweakrefcount(p1)
> (1, 1)

We still have only one strong reference, but now we have a weak reference to p1 as well! That weak reference is in the WeakKeyDictionary. We can easily see the weak references contained in that dictionary:

hex(id(p1)), list(d.keyrefs())
> ('0x7fbae83635c0', [<weakref at 0x7fbae8381958; to 'Person' at 0x7fbae83635c0>])

Now watch what happens to the dictionary when we delete the last strong reference to p1:

del p1
list(d.keyrefs())
> []

It was automatically removed when the object it was pointing to (weakly) was destroyed by the garbage collector! Now be careful, you can only use keys in the WeakKeyDictionary that Python can create weak references to Also, even though we are using a weak reference as a key in the dictionary, the object must still be hashable.

So we can certainly use WeakKeyDictionary objects in our data descriptors, but that will only work with hashable objects. Let's look at how to use WeakKeyDictionary as a storage mechanism for our data descriptors, as well as how to deal with the unhashable issue.

About

Complete Tuturial of Python Descriptors

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published