Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [need help]validate_on_save's behavior #775

Closed
hd10180 opened this issue Nov 10, 2023 · 5 comments
Closed

[BUG] [need help]validate_on_save's behavior #775

hd10180 opened this issue Nov 10, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@hd10180
Copy link

hd10180 commented Nov 10, 2023

Describe the bug
please regret for my bad english.

beanie's behavior maybe not the same as before.
i always use model_copy(update=schema.model_dump()) to update an document (it used to be .copy(update=schema.dict())),
but i found the data in database not the same as use ODM, eg: PydanticObjectId will be dump to str after use model_dump(), it used to be ObjectId in database if i set validate_on_save=True, it seems change to an str now.

For this changes the data in the database (which I think is more important) that I create an issue instead of discussion.
maybe relate to #664 , and i use the Reproduce code on #664

To Reproduce

import asyncio

from beanie import Document, PydanticObjectId, init_beanie
from motor.motor_asyncio import AsyncIOMotorClient
from pydantic import BaseModel, Field


class Base(Document):
    id: PydanticObjectId = Field(default_factory=PydanticObjectId)


class Tag(Base):
    number: int
    related: PydanticObjectId

    class Settings:
        name = "Tag"
        validate_on_save = True


class UpdateSchema(BaseModel):
    id: PydanticObjectId | None = None
    number: int | None = None
    related: PydanticObjectId | None = None


client = AsyncIOMotorClient("mongodb://localhost:27017")
db = client["test_beanie"]


async def init():
    await init_beanie(database=db, document_models=[Tag])


async def start_test():
    await init()
    tag_new = Tag(number=1, related=PydanticObjectId())
    await tag_new.create()
    print(tag_new)

    tag2 = await Tag.find_one(Tag.id == tag_new.id)
    print(f"find after create: {tag2}")

    update = UpdateSchema(related=PydanticObjectId())
    tag2 = tag2.model_copy(update=update.model_dump(exclude_unset=True))
    tag2.number = 999
    print(f"after model_copy: {tag2}")

    print("_______Do Save_________")
    await tag2.save()
    print(f"after save: {tag2}")

    tag3 = await Tag.find_one(Tag.id == tag_new.id)
    print(f"actually in database use beanie find_one: {tag3}")
    raw = await db[Tag.get_settings().name].find_one({"_id": tag_new.id})
    print(f"actually in database use motor find_one: {raw}")


async def main():
    await start_test()


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Expected behavior
the field must be an ObjectId but now it is str

Additional context
it make me confuse because the different type between database and model

@hd10180 hd10180 changed the title [BUG] validate_on_save's behavior [BUG] [need help]validate_on_save's behavior Nov 10, 2023
@hd10180
Copy link
Author

hd10180 commented Nov 10, 2023

This is my solution, but I don't know if it lacks necessary considerations.

DocType = TypeVar("DocType", bound="Base")


class Base(Document):
    id: PydanticObjectId = Field(default_factory=PydanticObjectId)

    def model_copy(
        self: DocType, *, update: dict[str, Any] | None = None, deep: bool = False
    ) -> DocType:
        """overwrite model_copy, re-init model"""
        copied = super().model_copy(update=update, deep=deep)
        return self.__class__(**copied.model_dump())

@roman-right
Copy link
Member

Hi @hd10180 ,
Thank you for the catch! It looks critical. I'll check it tomorrow.

@roman-right roman-right added the bug Something isn't working label Nov 10, 2023
@roman-right
Copy link
Member

Reproduced.
It's an interesting coincidence. The model_copy with an update doesn't validate what comes from the update on the Pydantic side:

update: Values to change/add in the new model. Note: the data isn't validated
                before creating the new model. You should trust this data.

On the Beanie side, validation before save doesn't update the document, only checks if fields are valid. PydanticObjectId should be representable both as str (to be able to receive JSON data) and as a binary object. It accepts str, and it's okay with it.

Thanks for the catch.

You can try the fixed version here: #776

I'll add the respective tests tomorrow and then publish it.

@bkis
Copy link

bkis commented Dec 4, 2023

I know I'm late as @roman-right already merged a fix, so I actually hope that what I'm doing here is superfluous. Two days ago I had a similar problem and prepared an as-close-to-reality-as-possible MRE for an issue here. Maybe it helps people looking for answers in the future.

This is very similar to the example posted here (and even the test in #776, actually):

import asyncio

from beanie import Document, PydanticObjectId, init_beanie
from motor.motor_asyncio import AsyncIOMotorClient
from pydantic import BaseModel


class Resource(Document):
    shares: list[PydanticObjectId] = []

    class Settings:
        name = "resources"
        validate_on_save = True


class ResourceUpdate(BaseModel):
    """Update model (for CRUD endpoints) with all fields optional"""

    shares: list[PydanticObjectId] | None = None


class User(BaseModel):
    id: PydanticObjectId


async def main():
    # setup DB
    client = AsyncIOMotorClient("mongodb://127.0.0.1:27017")
    await init_beanie(
        database=client.some_db,
        document_models=[Resource],
    )
    client.some_db["something"].drop()

    # create dummy user
    user = User(id="654ba3a6ec7833e469dde77a")

    # create dummy resource
    res = await Resource().create()

    # create update model that would be request model of a PATCH endpoint
    updates = ResourceUpdate(shares=[user.id])
    # update resource with the passed updates - to apply all the updates at once, the
    # model must be dumped at some point if we don't want to loop through its attributes
    # and apply them one by one, so the updated values all get serialized here
    # (PydanticObjectId -> str)
    await Resource.find_one(Resource.id == res.id).set(
        updates.model_dump(exclude_unset=True)
    )

    # at this point, the value for `shares` in the DB is a list of strings :(
    # so now if there's a request asking for all resources that are shared with
    # our user, we have to convert the ID we're querying for to str (not pretty!)

    # find resources shared with user (using `user.id` and not `str(user.id)`!)
    res = await Resource.find_one(Resource.shares == user.id, with_children=True)

    print(res)  # this would print "None" without the fix

if __name__ == "__main__":
    asyncio.run(main())

So yes, I agree, this bug is rather critical as it breaks a very common webapp data handling strategy (or at least makes it very hacky to get it to work). Looking forward to the release of the fix!

@hd10180
Copy link
Author

hd10180 commented Feb 27, 2024

fixed #776

@hd10180 hd10180 closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants