Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@
sections:
- local: concepts/git_vs_http
title: Git vs HTTP paradigm
- local: concepts/migration
title: Migrating to huggingface_hub v1.0
- title: 'Reference'
sections:
- local: package_reference/overview
Expand Down
53 changes: 11 additions & 42 deletions docs/source/en/concepts/git_vs_http.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,59 +4,28 @@ rendered properly in your Markdown viewer.

# Git vs HTTP paradigm

The `huggingface_hub` library is a library for interacting with the Hugging Face Hub, which is a
collection of git-based repositories (models, datasets or Spaces). There are two main
ways to access the Hub using `huggingface_hub`.
The `huggingface_hub` library is a library for interacting with the Hugging Face Hub, which is a collection of git-based repositories (models, datasets or Spaces). There are two main ways to access the Hub using `huggingface_hub`.

The first approach, the so-called "git-based" approach, is led by the [`Repository`] class.
This method uses a wrapper around the `git` command with additional functions specifically
designed to interact with the Hub. The second option, called the "HTTP-based" approach,
involves making HTTP requests using the [`HfApi`] client. Let's examine the pros and cons
of each approach.
The first approach, the so-called "git-based" approach, relies on using standard `git` commands directly in a terminal. This method allows you to clone repositories, create commits, and push changes manually. The second option, called the "HTTP-based" approach, involves making HTTP requests using the [`HfApi`] client. Let's examine the pros and cons of each approach.

## Repository: the historical git-based approach
## Git: the historical CLI-based approach

At first, `huggingface_hub` was mostly built around the [`Repository`] class. It provides
Python wrappers for common `git` commands such as `"git add"`, `"git commit"`, `"git push"`,
`"git tag"`, `"git checkout"`, etc.
At first, most users interacted with the Hugging Face Hub using plain `git` commands such as `git clone`, `git add`, `git commit`, `git push`, `git tag`, or `git checkout`.

The library also helps with setting credentials and tracking large files, which are often
used in machine learning repositories. Additionally, the library allows you to execute its
methods in the background, making it useful for uploading data during training.
This approach lets you work with a full local copy of the repository on your machine, just like in traditional software development. This can be an advantage when you need offline access or want to work with the full history of a repository. However, it also comes with downsides: you are responsible for keeping the repository up-to-date locally, handling credentials, and managing large files (via `git-lfs`), which can become cumbersome when working with large machine learning models or datasets.

The main advantage of using a [`Repository`] is that it allows you to maintain a local
copy of the entire repository on your machine. This can also be a disadvantage as
it requires you to constantly update and maintain this local copy. This is similar to
traditional software development where each developer maintains their own local copy and
pushes changes when working on a feature. However, in the context of machine learning,
this may not always be necessary as users may only need to download weights for inference
or convert weights from one format to another without the need to clone the entire
repository.

<Tip warning={true}>

[`Repository`] is now deprecated in favor of the http-based alternatives. Given its large adoption in legacy code, the complete removal of [`Repository`] will only happen in release `v1.0`.

</Tip>
In many machine learning workflows, you may only need to download a few files for inference or convert weights without needing to clone the entire repository. In such cases, using `git` can be overkill and introduce unnecessary complexity.

## HfApi: a flexible and convenient HTTP client

The [`HfApi`] class was developed to provide an alternative to local git repositories, which
can be cumbersome to maintain, especially when dealing with large models or datasets. The
[`HfApi`] class offers the same functionality as git-based approaches, such as downloading
and pushing files and creating branches and tags, but without the need for a local folder
that needs to be kept in sync.
The [`HfApi`] class was developed to provide an alternative to using local git repositories, which can be cumbersome to maintain, especially when dealing with large models or datasets. The [`HfApi`] class offers the same functionality as git-based workflows -such as downloading and pushing files and creating branches and tags- but without the need for a local folder that needs to be kept in sync.

In addition to the functionalities already provided by `git`, the [`HfApi`] class offers
additional features, such as the ability to manage repos, download files using caching for
efficient reuse, search the Hub for repos and metadata, access community features such as
discussions, PRs, and comments, and configure Spaces hardware and secrets.
In addition to the functionalities already provided by `git`, the [`HfApi`] class offers additional features, such as the ability to manage repos, download files using caching for efficient reuse, search the Hub for repos and metadata, access community features such as discussions, PRs, and comments, and configure Spaces hardware and secrets.

## What should I use ? And when ?

Overall, the **HTTP-based approach is the recommended way to use** `huggingface_hub`
in all cases. [`HfApi`] allows to pull and push changes, work with PRs, tags and branches, interact with discussions and much more. Since the `0.16` release, the http-based methods can also run in the background, which was the last major advantage of the [`Repository`] class.
Overall, the **HTTP-based approach is the recommended way to use** `huggingface_hub` in all cases. [`HfApi`] allows you to pull and push changes, work with PRs, tags and branches, interact with discussions and much more.

However, not all git commands are available through [`HfApi`]. Some may never be implemented, but we are always trying to improve and close the gap. If you don't see your use case covered, please open [an issue on Github](https://github.com/huggingface/huggingface_hub)! We welcome feedback to help build the 🤗 ecosystem with and for our users.
However, not all git commands are available through [`HfApi`]. Some may never be implemented, but we are always trying to improve and close the gap. If you don't see your use case covered, please open [an issue on GitHub](https://github.com/huggingface/huggingface_hub)! We welcome feedback to help build the HF ecosystem with and for our users.

This preference of the http-based [`HfApi`] over the git-based [`Repository`] does not mean that git versioning will disappear from the Hugging Face Hub anytime soon. It will always be possible to use `git` commands locally in workflows where it makes sense.
This preference for the HTTP-based [`HfApi`] over direct `git` commands does not mean that git versioning will disappear from the Hugging Face Hub anytime soon. It will always be possible to use `git` locally in workflows where it makes sense.
74 changes: 74 additions & 0 deletions docs/source/en/concepts/migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Migrating to huggingface_hub v1.0

The v1.0 release is a major milestone for the `huggingface_hub` library. It marks our commitment to API stability and the maturity of the library. We have made several improvements and breaking changes to make the library more robust and easier to use.

This guide is intended to help you migrate your existing code to the new version. If you have any questions or feedback, please let us know by [opening an issue on GitHub](https://github.com/huggingface/huggingface_hub/issues).

## Python 3.9+

`huggingface_hub` now requires Python 3.9 or higher. Python 3.8 is no longer supported.

## HTTPX migration

The `huggingface_hub` library now uses [`httpx`](https://www.python-httpx.org/) instead of `requests` for HTTP requests. This change was made to improve performance and to support both synchronous and asynchronous requests the same way. We therefore dropped both `requests` and `aiohttp` dependencies.

This is a major change that affects the entire library. While we have tried to make this change as transparent as possible, you may need to update your code in some cases. Here is a list of breaking changes introduced in the process:

- **Proxy configuration**: "per method" proxies are no longer supported. Proxies must be configured globally using the `HTTP_PROXY` and `HTTPS_PROXY` environment variables.
- **Custom HTTP backend**: The `configure_http_backend` function has been removed. You should now use [`set_client_factory`] and [`set_async_client_factory`] to configure the HTTP clients.
- **Error handling**: HTTP errors are not inherited from `requests.HTTPError` anymore, but from `httpx.HTTPError`. We recommend catching `huggingface_hub.HfHubHttpError` which is a subclass of `requests.HTTPError` in v0.x and of `httpx.HTTPError` in v1.x. Catching from the `huggingface_hub` error ensures your code is compatible with both the old and new versions of the library.
- **SSLError**: `httpx` does not have the concept of `SSLError`. It is now a generic `httpx.ConnectError`.
- **`LocalEntryNotFoundError`**: This error no longer inherits from `HTTPError`. We now define a `EntryNotFoundError` (new) that is inherited by both [`LocalEntryNotFoundError`] (if file not found in local cache) and [`RemoteEntryNotFoundError`] (if file not found in repo on the Hub). Only the remote error inherits from `HTTPError`.
- **`InferenceClient`**: The `InferenceClient` can now be used as a context manager. This is especially useful when streaming tokens from a language model to ensure that the connection is closed properly.
- **`AsyncInferenceClient`**: The `trust_env` parameter has been removed from the `AsyncInferenceClient`'s constructor. Environment variables are trusted by default by `httpx`. If you explicitly don't want to trust the environment, you must configure it with [`set_client_factory`].

For more details, you can check [PR #3328](https://github.com/huggingface/huggingface_hub/pull/3328) that introduced `httpx`.

## `Repository` class

The `Repository` class has been removed in v1.0. It was a thin wrapper around the `git` CLI for managing repositories. You can still use `git` directly in the terminal, but the recommended approach is to use the HTTP-based API in the `huggingface_hub` library for a smoother experience, especially when dealing with large files.

Here is a mapping from the legacy `Repository` class to the new `HfApi` one:

| `Repository` method | `HfApi` method |
| ------------------------------------------ | ----------------------------------------------------- |
| `repo.clone_from` | `snapshot_download` |
| `repo.git_add` + `git_commit` + `git_push` | [`upload_file`], [`upload_folder`], [`create_commit`] |
| `repo.git_tag` | `create_tag` |
| `repo.git_branch` | `create_branch` |

## `HfFolder` class

`HfFolder` was used to manage the user access token. Use [`login`] to save a new token, [`logout`] to delete it and [`whoami`] to check the user associated to the current token. Finally, use [`get_token`] to retrieve user's token in a script.


## `InferenceApi` class

`InferenceApi` was a class to interact with the Inference API. It is now recommended to use the [`InferenceClient`] class instead.

## Other deprecated features

Some methods and parameters have been removed in v1.0. The ones listed below have already been deprecated with a warning message in v0.x.

- `constants.hf_cache_home` has been removed. Please use `HF_HOME` instead.
- `use_auth_token` parameters have been removed from all methods. Please use `token` instead.
- `get_token_permission` method has been removed.
- `update_repo_visibility` method has been removed. Please use `update_repo_settings` instead.
- `is_write_action` parameter has been removed from `build_hf_headers` as well as `write_permission` from `login`. The concept of "write permission" has been removed and is no longer relevant now that fine-grained tokens are the recommended approach.
- `new_session` parameter in `login` has been renamed to `skip_if_logged_in` for better clarity.
- `resume_download`, `force_filename`, and `local_dir_use_symlinks` parameters have been removed from `hf_hub_download` and `snapshot_download`.
- `library`, `language`, `tags`, and `task` parameters have been removed from `list_models`.

## TensorFlow and Keras 2.x support

All TensorFlow-related code and dependencies have been removed in v1.0. This includes the following breaking changes:

- `huggingface_hub[tensorflow]` is no longer a supported extra dependency
- The `split_tf_state_dict_into_shards` and `get_tf_storage_size` utility functions have been removed.
- The `tensorflow`, `fastai`, and `fastcore` versions are no longer included in the built-in headers.

The Keras 2.x integration has also been removed. This includes the `KerasModelHubMixin` class and the `save_pretrained_keras`, `from_pretrained_keras`, and `push_to_hub_keras` utilities. Keras 2.x is a legacy and unmaintained library. The recommended approach is to use Keras 3.x which is tightly integrated with the Hub (i.e. it contains built-in method to load/push to Hub). If you still want to work with Keras 2.x, you should downgrade `huggingface_hub` to v0.x version.

## `upload_file` and `upload_folder` return values

The [`upload_file`] and [`upload_folder`] functions now return the URL of the commit created on the Hub. Previously, they returned the URL of the file or folder. This is to align with the return value of [`create_commit`], [`delete_file`] and [`delete_folder`].
42 changes: 30 additions & 12 deletions docs/source/en/package_reference/utilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,23 +120,41 @@ You can also enable or disable progress bars for specific groups. This allows yo

[[autodoc]] huggingface_hub.utils.enable_progress_bars

## Configure HTTP backend
## Configuring the HTTP Backend

In some environments, you might want to configure how HTTP calls are made, for example if you are using a proxy.
`huggingface_hub` let you configure this globally using [`configure_http_backend`]. All requests made to the Hub will
then use your settings. Under the hood, `huggingface_hub` uses `requests.Session` so you might want to refer to the
[`requests` documentation](https://requests.readthedocs.io/en/latest/user/advanced) to learn more about the available
parameters.
<Tip>

Since `requests.Session` is not guaranteed to be thread-safe, `huggingface_hub` creates one session instance per thread.
Using sessions allows us to keep the connection open between HTTP calls and ultimately save time. If you are
integrating `huggingface_hub` in a third-party library and wants to make a custom call to the Hub, use [`get_session`]
to get a Session configured by your users (i.e. replace any `requests.get(...)` call by `get_session().get(...)`).
In `huggingface_hub` v0.x, HTTP requests were handled with `requests`, and configuration was done via `
git push --set-upstream origin v1.0-some-more-docsend`. Since we now use `httpx`, configuration works differently: you must provide a factory function that takes no arguments and returns an `httpx.Client`. You can review the [default implementation here](https://github.com/huggingface/huggingface_hub/blob/v1.0-release/src/huggingface_hub/utils/_http.py) to see which parameters are used by default.

[[autodoc]] configure_http_backend
</Tip>


In some setups, you may need to control how HTTP requests are made, for example when working behind a proxy. The `huggingface_hub` library allows you to configure this globally with [`set_client_factory`]. After configuration, all requests to the Hub will use your custom settings. Since `huggingface_hub` relies on `httpx.Client` under the hood, you can check the [`httpx` documentation](https://www.python-httpx.org/advanced/clients/) for details on available parameters.

If you are building a third-party library and need to make direct requests to the Hub, use [`get_session`] to obtain a correctly configured `httpx` client. Replace any direct `httpx.get(...)` calls with `get_session().get(...)` to ensure proper behavior.

[[autodoc]] set_client_factory

[[autodoc]] get_session

In rare cases, you may want to manually close the current session (for example, after a transient `SSLError`). You can do this with [`close_session`]. A new session will automatically be created on the next call to [`get_session`].

Sessions are always closed automatically when the process exits.

[[autodoc]] close_session

For async code, use [`set_async_client_factory`] to configure an `httpx.AsyncClient` and [`get_async_session`] to retrieve one.

[[autodoc]] set_async_client_factory

[[autodoc]] get_async_session

<Tip>

Unlike the synchronous client, the lifecycle of the async client is not managed automatically. Use an async context manager to handle it properly.

</Tip>

## Handle HTTP errors

Expand Down Expand Up @@ -278,4 +296,4 @@ validated.

Not exactly a validator, but ran as well.

[[autodoc]] utils.smoothly_deprecate_legacy_arguments
[[autodoc]] utils._validators.smoothly_deprecate_legacy_arguments
2 changes: 0 additions & 2 deletions docs/source/ko/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@
title: 명령줄 인터페이스(CLI) 사용하기
- local: guides/hf_file_system
title: Hf파일시스템
- local: guides/repository
title: 리포지토리
- local: guides/search
title: Hub에서 검색하기
- local: guides/inference
Expand Down
10 changes: 0 additions & 10 deletions docs/source/ko/package_reference/utilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,16 +84,6 @@ True

[[autodoc]] huggingface_hub.utils.enable_progress_bars

## HTTP 백엔드 구성[[huggingface_hub.configure_http_backend]]

일부 환경에서는 HTTP 호출이 이루어지는 방식을 구성할 수 있습니다. 예를 들어, 프록시를 사용하는 경우가 그렇습니다. `huggingface_hub`는 [`configure_http_backend`]를 사용하여 전역적으로 이를 구성할 수 있게 합니다. 그러면 Hub로의 모든 요청이 사용자가 설정한 설정을 사용합니다. 내부적으로 `huggingface_hub`는 `requests.Session`을 사용하므로 사용 가능한 매개변수에 대해 자세히 알아보려면 [requests 문서](https://requests.readthedocs.io/en/latest/user/advanced)를 참조하는 것이 좋습니다.

`requests.Session`이 스레드 안전을 보장하지 않기 때문에 `huggingface_hub`는 스레드당 하나의 세션 인스턴스를 생성합니다. 세션을 사용하면 HTTP 호출 사이에 연결을 유지하고 최종적으로 시간을 절약할 수 있습니다. `huggingface_hub`를 서드 파티 라이브러리에 통합하고 사용자 지정 호출을 Hub로 만들려는 경우, [`get_session`]을 사용하여 사용자가 구성한 세션을 가져옵니다 (즉, 모든 `requests.get(...)` 호출을 `get_session().get(...)`으로 대체합니다).

[[autodoc]] configure_http_backend

[[autodoc]] get_session


## HTTP 오류 다루기[[handle-http-errors]]

Expand Down
6 changes: 3 additions & 3 deletions src/huggingface_hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@
"HfHubAsyncTransport",
"HfHubTransport",
"cached_assets_path",
"close_client",
"close_session",
"dump_environment_info",
"get_async_session",
"get_session",
Expand Down Expand Up @@ -815,7 +815,7 @@
"cancel_access_request",
"cancel_job",
"change_discussion_status",
"close_client",
"close_session",
"comment_discussion",
"create_branch",
"create_collection",
Expand Down Expand Up @@ -1518,7 +1518,7 @@ def __dir__():
HfHubAsyncTransport, # noqa: F401
HfHubTransport, # noqa: F401
cached_assets_path, # noqa: F401
close_client, # noqa: F401
close_session, # noqa: F401
dump_environment_info, # noqa: F401
get_async_session, # noqa: F401
get_session, # noqa: F401
Expand Down
2 changes: 1 addition & 1 deletion src/huggingface_hub/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
CLIENT_FACTORY_T,
HfHubAsyncTransport,
HfHubTransport,
close_client,
close_session,
fix_hf_endpoint_in_url,
get_async_session,
get_session,
Expand Down
Loading