Skip to content

Conversation

rx-dwoodward
Copy link

Adds pydantic v1 support through validators like __get_validators__ and a custom .to_dict() method expected to be used alongside Config.json_encoders in pydantic v1 style BaseModel's.

I altered the testing structure for pydantic somewhat to ensure that both v1 / v2 'style' validation is tested using a parametrisation framework, and capped the pydantic version (for tests only) to <3 such that the pydantic v1 style validation should be tested against as well.

The main downside to capping the pydantic version for tests is if pydantic makes api changes in subsequent versions that do not accept the v1 shim then this testing framework would have to be split out to ensure coverage (in an environment with >3 and another with <3 tested which would be a pain, but this should have no effect on actual usage of the package since no pydantic code is explicitly used in the UPath still.

Issue link where this is discussed: #397

Let me know if there is anything else you'd like me to add to this - thanks!

@rx-dwoodward
Copy link
Author

@ap-- would it be possible to approve the workflow here? Was going to mark this as ready once all the linting / tests pass correctly.

Copy link
Collaborator

@ap-- ap-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rx-dwoodward, could you comment why this couldn't just implement __get_validators__?

What's the limitation in v1 here?

Comment on lines +195 to +201
def to_dict(self) -> SerializedUPath:
return {
"path": self.path,
"protocol": self.protocol,
"storage_options": dict(self.storage_options),
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is to_dict() needed? It does not seem to be a standard v1 BaseModel method.

Let's not extend the public UPath class API for pydantic v1 support

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thats fair enough. I wanted to include a convenience method for enabling JSON serialization in pydantic v1 models. At present, there would not be a way to serialize this class that a V1 model would utilize - for example doing:

from pydantic.v1 import BaseModel

class Model(BaseModel):

    path: UPath


model = Model(path="s3://bucket/path/to/key/")

model.json() # hits TypeError: Object of type 'S3Path' is not JSON serializable

whereas if I do:

class Model(BaseModel):
    path: UPath
        class Config:
            json_encoders = {UPath: UPath.to_dict}

model = Model(path="s3://bucket/path/to/key/")
model.json() # == '{"path": {"path": "bucket/path/to/key", "protocol": "s3", "storage_options": {}}}'

So its mainly a way to expose an easier serialization method that can be used with v1 models - unfortunately the serialization format for v2 is not used with v1 models (and you can't define a hook method on the custom class to provide a standardized serialization method for the class 😭, its solely the responsibility of the v1 model that uses the custom class), so I thought it might be a useful idea to include a serialization method to the class to provide an easier method to use as a json encoder hook. Totally okay with removing it though if you think this is too much over-reach of this PR.

Copy link
Collaborator

@ap-- ap-- Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...

We could introduce a general serializer helper function in universal-pathlib, that could then be used.

In principle this would be:

serialize_upath = upath.UPath.__get_pydantic_core_schema__(None, **None)['serialization']['function']

Thinking about this, that might be the best idea. Something similar will land with chaining support: https://github.com/fsspec/universal_pathlib/pull/346/files#diff-79e86edc0bf259fa57cede9d23833d37576186abbdfe51591786da661c2620dfR28-R31

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but wouldn't this enforce that the serialization method requires installation of pydantic >= 2 to execute? Since the .__get_pydantic_core_schema__ imports from a pydantic v2 import path to generate the schema first.

I agree I would like to_dict to be executed akin to TypeAdapter(UPath).dump_python(upath, mode="json") but didn't want to enforce installation of pydantic v2 just for dictifying the a UPath (especially since in the tests there are scenarios where JSON serialization would fail anyway if some storage_options are not serializable anyway, so figured it probably didn't make sense / matter to exactly use the TypeAdapter pattern (it would just be softly asking for an additional dependency to be added [softly because the pydantic v2 import would be within the serialization function, or in __get_pydantic_core_schema__ etc]).

But perhaps thats not a big deal - if you want to serialize a UPath - you need pydantic>=2 🤷

Comment on lines +1014 to +1016
@classmethod
def __get_validators__(cls) -> Iterator[Callable]:
yield cls._validate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could support for v1 be added by only introducing __get_validators__ ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned in a comment below but linked again for good measure - ultimately __get_validators__ requires yielding callable validation methods / functions that are used during validation - so we need to yield a callable still.

Comment on lines +1038 to +1048
@classmethod
def _validate(cls, v: Any) -> UPath:
if not isinstance(v, UPath):
v = cls._to_serialized_format(v)

return cls(
v["path"],
protocol=v.get("protocol"),
**v.get("storage_options", {}), # type: ignore[arg-type]
)
return v
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this classmethod needed specifically?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is actually executed during v1 validation - I added this as a function such that now both v1 and v2 actually run this method when executing validation. I believe the v2 validation used lambda's beforehand depending on the input type within __get_pydantic_core_schema__, whereas this is a single source of truth for both v1 and v2 validation with this PR which imo is cleaner than implementing the validation procedure twice for the different versions.

LMK what you think though - I can change it back to having two separate procedures for v1 and v2 if need be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase: Why does this have to be attached to the class interface. I.e. this could be a separate helper function for v1 support living outside of the UPath class.

I can change it back to having two separate procedures for v1 and v2 if need be.

I'd prefer that. It'll make maintenance easier down the line, when v1 support is dropped eventually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess my point is that encapsulating this logic within one method that is used by both v1 (__get_validators__) and v2 (__get_pydantic_core_schema__) means that this logic is not necessarily just 'v1' logic. The only 'v1' logic that would be attached to the class is the __get_validators__ method ultimately. I believe having this validation entirely separate will make for more of a maintenance burden / require more code to be removed if v1 support is dropped.

But okay, so you would prefer having a separate function that is not a classmethod that undertakes this validation and have it separate to the v2 validation?

pyproject.toml Outdated
"pylint >=2.17.4",
"mypy >=1.10.0",
"pydantic >=2",
"pydantic >=2,<3", # <3 required for testing pydantic v1 support, not for actual use
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not predict the future. It's better if these tests break once pydantiv 3 actually breaks the api we would rely on in here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is fair - I can remove the pin but might keep the comment such that when tests inevitably fail when pydantic v3 is released there is something to note why this might be the case? Does that sound suitable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI here is the version policy indicating the intention to remove pydantic.v1 from v3 onwards.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI here is the version policy indicating the intention to remove pydantic.v1 from v3 onwards.

I am not sure I follow. Where does is say in there that pydantic.v1 won't be available in v3 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry good point - I might have misread there - but here is the comment from Sam Colvin about intended removal of the V1 shim.

@rx-dwoodward
Copy link
Author

Hi @rx-dwoodward, could you comment why this couldn't just implement __get_validators__?

What's the limitation in v1 here?

so __get_validators__ is supposed to yield a number of validation functions, so I need to include the classmethod to provide the actual validation that gets run. See here.

@ap--
Copy link
Collaborator

ap-- commented Aug 27, 2025

so get_validators is supposed to yield a number of validation functions, so I need to include the classmethod to provide the actual validation that gets run. See here.

Well their example yields a classmethod, but i don't see a requirement for this to be a classmethod. The requirement is that you yield a callable, that adheres to the Validator interface. There seems to be note that these are classmethods https://docs.pydantic.dev/1.10/usage/validators/, but it's unclear to me if this is actually a requirement. You could still just bind a helper function to the class if needed before yielding it.

@rx-dwoodward
Copy link
Author

Yeah you are right it doesn't have to be a classmethod, I just figured its nicer to bind this validation as a method to the object rather than a standalone function outside of this. Typically its a classmethod such that cls(**v) or equivalent is used so that the output of validation returns an instance of the class - which ofc would be fine in this case straight up using UPath(...) because UPath dispatches to the correct type under the hood anyway.

So yes it does not have to be a classmethod if you'd prefer, I just figured it was cleaner to have this attached to the class (as is normally the case) compared to a separate function outside of the class.

@ap--
Copy link
Collaborator

ap-- commented Aug 27, 2025

So it sounds to me as if:

def somehow_serialize_upath_to_dict(pth: UPath) -> UPathTypedDict: ...

class Model(BaseModel):
    path: UPath
    
    class Config:
        json_encoders = {UPath: somehow_serialize_upath_to_dict}

Would be a recipe we can't get around for pydantic v1 support. That is because unlike with v2, there is no way to extend a class that does not inherit from BaseModel so that it provides its (de-)serialization methods to pydantic.

Is there a v1 way to make UPath the root model? so that there would be a UPathV1Model class, that has the necessary pydantic methods?

@rx-dwoodward
Copy link
Author

Yeah thats exactly right - its a particular annoyance of pydantic v1 to be honest.

I mean you could do something like:

from pydantic.v1 import BaseModel as V1BaseModel, PrivateAttr

class UPathModel(V1BaseModel):
    path: str
    protocol: str  = ""
    storage_options: dict[str, Any] | None = None

    _upath: UPath = PrivateAttr(None)

    class Config:
        orm_mode: bool = True
        # required when pydantic models have properties to serialize correctly.
        keep_untouched = (property,)

    def __init__(self, **data) -> None:
        super().__init__(**data)
        self._upath = UPath(self.path, protocol=self.protocol, **self.storage_options)
    
    # enables construction from a single string rather than a dictionary with the field keys
    @classmethod
    def from_orm(cls, v: Any) -> "UPathModel":
        upath = UPath(v)
        return cls(path=upath.path, protocol=upath.protocol, storage_options=upath.storage_options)
    
    # provides 'public' access to the UPath created.
    @property
    def upath(self) -> "UPath":
        return self._upath

# consumer of the UPath / UPathModel
class Model(V1BaseModel):
    path: UPathModel

# construction of `UPathModel` from a string using `from_orm` under the hood
model = Model(path="s3://bucket/path/to/key")
type(model.path) # == UPathModel
type(model.path.upath) # == S3Path

model.json() # == '{"path": {"path": "bucket/path/to/key", "protocol": "s3", "storage_options": {}}}'

Now this handles the jsonifying - because it jsonifies the model not the UPath, but it means you have to access the actual UPath via model.path.upath instead of just model.path.

We could change the UPathModel validation to end up returning the UPath instead of the UPathModel - but this then would get us back to where we started with json serializing the UPath failing and needing to incorporate a json_encoders config argument.

So imo each solution is inelegant, but directly using the UPath as a field parameter is imo nicer than having to swap the access of the path directly. Ofc you could mess around more and have __getattr__ on the UPathModel refer to calling getattr(self._upath) but that adds so much complexity imo compared to just asking v1 users to add the json_encoders for UPath with the convenience UPath.to_dict method.

LMK what you think though (and apologies for having to drop these annoying nuances of pydantic v1 on you 😓)

@rx-dwoodward
Copy link
Author

So it sounds to me as if:

def somehow_serialize_upath_to_dict(pth: UPath) -> UPathTypedDict: ...

class Model(BaseModel):
    path: UPath
    
    class Config:
        json_encoders = {UPath: somehow_serialize_upath_to_dict}

Would be a recipe we can't get around for pydantic v1 support. That is because unlike with v2, there is no way to extend a class that does not inherit from BaseModel so that it provides its (de-)serialization methods to pydantic.

Yeah this is exactly the case. Hence I thought it would be nicer to have the serialization as a method on the UPath than import it from some other location even if the function name is clear. But lmk if you would prefer a different solution!

@ap--
Copy link
Collaborator

ap-- commented Sep 1, 2025

xref: pydantic v2/v1 download statitistics: pydantic/pydantic#11613 (comment)

@ap--
Copy link
Collaborator

ap-- commented Oct 3, 2025

Hi @rx-dwoodward

I spent a bit of time thinking about this, and I'd be happy to integrate basic functionality for serialization into universal-pathlib, and provide a recipe in the README how to use it to set up a minimal pydantic v1 model.

If you're still interested, could you change this PR, and add a upath.serializers module with a upath_to_dict and upath_from_dict function, that allow going from/to UPath to/from SerializedUPath. Then I think the only addition needed in upath.core for more convenient v1 support is the __get_validators__ class method, that would yield upath_from_dict and UPath itself.

The PR could then add a small section to the readme for pydantic support, and provide a recipe for how to use it with v1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants