-
-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic compatibility issue #1677
Comments
Here is my real hacky workaround (no idea if it is right): import pandas as pd
import pandera as pa
from pandera.typing import DataFrame as _DataFrame, Series
from pydantic_core import core_schema, CoreSchema
from pydantic import GetCoreSchemaHandler, BaseModel
from typing import TypeVar, Generic, Any
T = TypeVar("T")
class DataFrame(_DataFrame, Generic[T]):
@classmethod
def __get_pydantic_core_schema__(
cls, source_type: Any, handler: GetCoreSchemaHandler
) -> CoreSchema:
schema = source_type().__orig_class__.__args__[0].to_schema()
type_map = {
"str": core_schema.str_schema(),
"int64": core_schema.int_schema(),
"float64": core_schema.float_schema(),
"bool": core_schema.bool_schema(),
"datetime64[ns]": core_schema.datetime_schema()
}
return core_schema.list_schema(
core_schema.typed_dict_schema(
{
i:core_schema.typed_dict_field(type_map[str(j.dtype)]) for i,j in schema.columns.items()
},
)
)
class SimpleSchema(pa.DataFrameModel):
str_col: Series[str]
class PydanticModel(BaseModel):
x: int
df: DataFrame[SimpleSchema] |
@riziles @cosmicBboy any update on this pydantic compatibility issue with json schema and a possible fix in pandera? I am running into this same error in pandera 0.22.1. Looks like the fix PR did not get merged. |
Looks like #1704 addresses this, but it still has CI test errors |
any update on this. This issue blocks generating docs page for fastapi. |
@ragrawal , you're welcome to take a swing at figuring out why some tests are failing. I don't have the bandwidth to work on this right now. |
@riziles -- I looked into the PR and not able to get it working. I am having trouble setting up the development environment. Also I don't think the PR is generic enough. It is trying to handle very special case. I don't have in-depth understanding of pydantic or pandera. Will appreciate if someone can suggest any other hack to get past the above issue |
This works: import pandas as pd
import pandera as pa
from pandera.typing import DataFrame as DataFrame, Series
from pydantic import BaseModel, WithJsonSchema
from typing import Annotated
from fastapi import FastAPI
app = FastAPI()
class SimpleSchema(pa.DataFrameModel):
str_col: Series[str]
class PydanticModel3(BaseModel):
y: Annotated[
DataFrame[SimpleSchema],
WithJsonSchema(SimpleSchema.to_json_schema()),
]
@app.post("/input_api")
def input_this(pm3:PydanticModel3) -> list[str]:
return pm3.y["str_col"].to_list() |
...if you specify a import pandas as pd
import pandera as pa
from pandera.typing import DataFrame as DataFrame, Series
from pydantic import BaseModel, WithJsonSchema
from typing import Annotated
from fastapi import FastAPI
app = FastAPI()
class SimpleSchema(pa.DataFrameModel):
str_col: Series[str]
class Config:
to_format = "dict"
class PydanticModel3(BaseModel):
y: Annotated[
DataFrame[SimpleSchema],
WithJsonSchema(SimpleSchema.to_json_schema()),
]
@app.post("/input_api")
def input_this(pm3:PydanticModel3) -> PydanticModel3:
return pm3 |
... also, you can just use import pandas as pd
import pandera as pa
from pandera.typing import DataFrame as DataFrame, Series
from typing import Annotated
from pydantic import WithJsonSchema
from fastapi import FastAPI
app = FastAPI()
class SimpleSchema(pa.DataFrameModel):
str_col: Series[str]
class Config:
to_format = "dict"
@app.post("/input_api")
def input_this(
pm3: Annotated[
DataFrame[SimpleSchema],
WithJsonSchema(SimpleSchema.to_json_schema()),
],
) -> Annotated[
DataFrame[SimpleSchema],
WithJsonSchema(SimpleSchema.to_json_schema()),
]:
return pm3 |
Thanks @riziles .. this works great. Wondering do you know how can provide input data in "records" format. I tried adding
However I got this error message: Below is my full code
|
@ragrawal , I'd recommend creating your own custom Pydantic class to read in whatever format you want if you don't want to use Pandera's default config. For example, something like this: import pandas as pd
import pandera as pa
from pandera.typing import DataFrame as DataFrame, Series
from typing import Annotated
from pydantic import WithJsonSchema, BaseModel
from fastapi import FastAPI
app = FastAPI()
class SimpleSchema(pa.DataFrameModel):
str_col: Series[str]
str_col2: Series[str]
class Config:
to_format = "dict"
class InputModel(BaseModel):
str_col: str
str_col2: str
@app.post("/input_api")
def input_this(
pm3: list[InputModel],
) -> list[str]:
df = DataFrame[SimpleSchema](pd.DataFrame([vars(i) for i in pm3]))
print(pm3)
print(type(df))
return df["str_col2"].to_list() |
@ragrawal , can we close this issue? |
Sure..appreciate your help on this. |
@riziles I think the issue is still relevant despite the above workaround since ideally pandera would work without special annotation when generating schema in pydantic and fastapi |
Wait a second. Just realizing that I opened this issue. I'm closing it as resolved because this project is awesome and @cosmicBboy probably has better things to work on. |
Agree with @eharkins. FastAPI is already very popular and it is likely to become the most popular python web framework in the future. I believe that having full compatibility on documentation generation would be beneficial for pandera usage in production environments. |
@riziles let's open it back up! There's a WIP PR that addresses it #1704 but there are still some unit test issues on it. @imseananriley not sure if you still have capacity to work on this, if not perhaps someone on the thread can look into making tests pass |
imseanriley is preoccupied at the moment. I might be able to throw some resources at it this summer, but I'd much rather focus on killing the Pandas dependency. We're very intent on migrating to Polars/Lance/DuckDB. Right now there is a competing project that has better Polars support: https://github.com/JakobGM/patito . I'd prefer to leave our Pandera models in tact, but not if I have to keep Pandas in our containers. |
thanks @riziles, let me digest this feedback. It might be time to do
What are some of the deltas you see in patito that are missing in the pandera-polars integration? |
In the mean time I'll look into fixing up #1704 to unblock this issue |
It's just the removal of the Pandas dependency. Pandas is a heavy package that takes up a lot of space when spinning up environments and slows down start times if it needs to be imported. |
@ragrawal , I just discovered @cosmicBboy 's Easier way to do what you are looking for: import pandera as pa
from pandera.typing import DataFrame as DataFrame
from pydantic import BaseModel, TypeAdapter
from pandera.engines.pandas_engine import PydanticModel
from fastapi import FastAPI
app = FastAPI()
class InputModel(BaseModel):
str_col: str
str_col2: str
class SimpleSchema(pa.DataFrameModel):
class Config: # type: ignore
dtype = PydanticModel(InputModel)
coerce = True
@app.post("/test")
def input_this(pm3: list[InputModel]) -> list[str]:
df = DataFrame[SimpleSchema](TypeAdapter(list[InputModel]).dump_python(pm3))
return df["str_col2"].to_list() |
Hi @riziles -- Thanks for the suggestion. I have used PydanticModel before and had two concerns
|
@ragrawal , if you want to input row wise data, there's always going to be more overhead. The whole reason Pandas, Polars, Arrow, Lance and DuckDB are so fast is that the data is stored in column vectors. |
fixed by #1904 |
hi @cosmicBboy -- wondering with 1904 now merged, how to simplify the below solution
|
@ragrawal I can test it out and see if we can simplify. Can you share full repro code on starting the server and making a call to the |
If I understand correctly, the above code is a workaround to enable proper json schema generation for openapi docs. E.g. running the above code stored in from typing import Annotated
import pandera as pa
from pandera.typing import DataFrame as DataFrame, Series
from fastapi import FastAPI, Body
app = FastAPI()
class SimpleSchema(pa.DataFrameModel):
str_col: Series[str]
@app.post("/input_api")
def input_this(
pm3: Annotated[DataFrame[SimpleSchema], Body()],
) -> DataFrame[SimpleSchema]:
return pm3 |
Hey @riziles just to follow up here: I made a PR that removes the pandas dependency from polars, and makes it the user's responsibility to install pandas explicitly (or use the |
Thank you @cosmicBboy ! |
I believe that the latest versions of Pydantic and Pandera are not fully compatible.
This relates to #1395 which was closed, but I think should still be open
This code throws an error:
error message:
I have tried various config options to get around this error to no avail.
The text was updated successfully, but these errors were encountered: