-
-
Notifications
You must be signed in to change notification settings - Fork 685
[GSoC 2025] Load model JSON files from URLs #5137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
I've updated the
Please let me know if any part of this approach needs changing or improvisation. |
@medha-14 can you change the base branch here? Would be much easier to review |
src/pybamm/dispatch/entry_points.py
Outdated
def get_cache_path(url): | ||
cache_dir = Path.home() / ".pybamm_cache" / "pybamm" / "models" | ||
cache_dir.mkdir(parents=True, exist_ok=True) | ||
file_hash = hashlib.md5(url.encode()).hexdigest() | ||
return cache_dir / f"{file_hash}.json" | ||
|
||
|
||
def clear_model_cache(): | ||
cache_dir = Path.home() / ".pybamm_cache" / "pybamm" / "models" | ||
if cache_dir.exists(): | ||
for file in cache_dir.glob("*.json"): | ||
file.unlink() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be handled using platformdirs.user_cache_dir
: https://platformdirs.readthedocs.io/en/latest/api.html#cache-directory
Hi @medha-14, could you please create a PR within your fork from this link? medha-14/PyBaMM@GSoC...medha-14:PyBaMM:load_json_from_url. I see a better diff there. We could add comments on that PR until #5056 is merged. Otherwise, I think it is on the right track, modulo that we need to use |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5137 +/- ##
===========================================
- Coverage 98.88% 98.86% -0.03%
===========================================
Files 320 320
Lines 26949 26988 +39
===========================================
+ Hits 26648 26681 +33
- Misses 301 307 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
from pybamm.expression_tree.operations.serialise import Serialise | ||
|
||
APP_NAME = "pybamm" | ||
APP_AUTHOR = "pybamm" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep just APP_NAME
and drop APP_AUTHOR
, and we can set Path(user_cache_dir(APP_NAME)) / "models"
as a constant here instead.
def get_cache_path(url: str) -> Path: | ||
cache_dir = _get_cache_dir() | ||
file_hash = hashlib.md5(url.encode()).hexdigest() | ||
return cache_dir / f"{file_hash}.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why MD5 and not SHA-256? :)
try: | ||
file.unlink() | ||
except Exception as e: | ||
# Optional: log error instead of failing silently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Optional: log error instead of failing silently |
def clear_model_cache() -> None: | ||
cache_dir = _get_cache_dir() | ||
for file in cache_dir.glob("*.json"): | ||
try: | ||
file.unlink() | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we also store some of PyBaMM's data files in the cache dir using a pooch
registry:
PyBaMM/src/pybamm/pybamm_data.py
Lines 126 to 139 in a1aa02c
def get_data(self, filename: str): | |
""" | |
Fetches the data file from upstream and stores it in the local cache directory under pybamm directory. | |
Parameters | |
---------- | |
filename : str | |
Name of the data file to be fetched from the registry. | |
Returns | |
------- | |
pathlib.PurePath | |
""" | |
self.registry.fetch(filename) | |
return pathlib.Path(f"{self.path}/{self.version}/{filename}") |
I wonder if we could reuse some of the code here, because clearing the cache directory of JSON files could have unintended side effects. We do have JSON files there: https://github.com/pybamm-team/pybamm-data/releases/tag/v1.0.1
Could we perhaps rewrite the PR to use pooch
instead? That will also provide safer defaults than using urllib.request.urlretrieve()
directly, and we could rely on it for the checksum/caching bits to see if we need to download the model JSON again or not.
def Model( | ||
model=None, | ||
url=None, | ||
force_download=False, | ||
*args, | ||
**kwargs, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have better typing here.
Description
This PR is based on #5056 and contains changes that build upon it.
Once #5056 is merged, the diff for this PR will automatically update to show only the relevant changes.
Type of change
Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #)
Important checks:
Please confirm the following before marking the PR as ready for review:
nox -s pre-commit
nox -s tests
nox -s doctests