Skip to content

Commit ebb63df

Browse files
authored
Add test_gemma3.rs for EmbeddingGemma (#718)
1 parent d7af1fc commit ebb63df

File tree

6 files changed

+3183
-46
lines changed

6 files changed

+3183
-46
lines changed

.github/workflows/test.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ on:
44
workflow_dispatch:
55
push:
66
branches:
7-
- 'main'
7+
- "main"
88
pull_request:
99
paths:
1010
- ".github/workflows/build.yaml"
@@ -17,7 +17,7 @@ on:
1717
- "rust-toolchain.toml"
1818
- "Dockerfile"
1919
branches:
20-
- 'main'
20+
- "main"
2121

2222
jobs:
2323
tests:
@@ -38,6 +38,7 @@ jobs:
3838
env:
3939
SCCACHE_GHA_ENABLED: "true"
4040
RUSTC_WRAPPER: "sccache"
41+
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
4142
run: |
4243
sudo apt-get update && sudo apt-get install protobuf-compiler -y
4344
cargo test --profile=release-debug

README.md

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
A blazing fast inference solution for text embeddings models.
1313

14-
Benchmark for [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on an Nvidia A10 with a sequence
14+
Benchmark for [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on an NVIDIA A10 with a sequence
1515
length of 512 tokens:
1616

1717
<p>
@@ -66,29 +66,31 @@ Ember, GTE and E5. TEI implements many features such as:
6666
#### Text Embeddings
6767

6868
Text Embeddings Inference currently supports Nomic, BERT, CamemBERT, XLM-RoBERTa models with absolute positions, JinaBERT
69-
model with Alibi positions and Mistral, Alibaba GTE, Qwen2 models with Rope positions, MPNet, ModernBERT, and Qwen3.
69+
model with Alibi positions and Mistral, Alibaba GTE, Qwen2 models with Rope positions, MPNet, ModernBERT, Qwen3, and Gemma3.
7070

7171
Below are some examples of the currently supported models:
7272

73-
| MTEB Rank | Model Size | Model Type | Model ID |
74-
|-----------|---------------------|-------------|--------------------------------------------------------------------------------------------------|
75-
| 2 | 8B (Very Expensive) | Qwen3 | [Qwen/Qwen3-Embedding-8B](https://hf.co/Qwen/Qwen3-Embedding-8B) |
76-
| 4 | 0.6B | Qwen3 | [Qwen/Qwen3-Embedding-0.6B](https://hf.co/Qwen/Qwen3-Embedding-0.6B) |
77-
| 6 | 7B (Very Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-7B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-7B-instruct) |
78-
| 7 | 0.5B | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) |
79-
| 14 | 1.5B (Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) |
80-
| 17 | 7B (Very Expensive) | Mistral | [Salesforce/SFR-Embedding-2_R](https://hf.co/Salesforce/SFR-Embedding-2_R) |
81-
| 34 | 0.5B | XLM-RoBERTa | [Snowflake/snowflake-arctic-embed-l-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-l-v2.0) |
82-
| 40 | 0.3B | Alibaba GTE | [Snowflake/snowflake-arctic-embed-m-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-m-v2.0) |
83-
| 51 | 0.3B | Bert | [WhereIsAI/UAE-Large-V1](https://hf.co/WhereIsAI/UAE-Large-V1) |
84-
| N/A | 0.4B | Alibaba GTE | [Alibaba-NLP/gte-large-en-v1.5](https://hf.co/Alibaba-NLP/gte-large-en-v1.5) |
85-
| N/A | 0.4B | ModernBERT | [answerdotai/ModernBERT-large](https://hf.co/answerdotai/ModernBERT-large) |
86-
| N/A | 0.3B | NomicBert | [nomic-ai/nomic-embed-text-v2-moe](https://hf.co/nomic-ai/nomic-embed-text-v2-moe) |
87-
| N/A | 0.1B | NomicBert | [nomic-ai/nomic-embed-text-v1](https://hf.co/nomic-ai/nomic-embed-text-v1) |
88-
| N/A | 0.1B | NomicBert | [nomic-ai/nomic-embed-text-v1.5](https://hf.co/nomic-ai/nomic-embed-text-v1.5) |
89-
| N/A | 0.1B | JinaBERT | [jinaai/jina-embeddings-v2-base-en](https://hf.co/jinaai/jina-embeddings-v2-base-en) |
90-
| N/A | 0.1B | JinaBERT | [jinaai/jina-embeddings-v2-base-code](https://hf.co/jinaai/jina-embeddings-v2-base-code) |
91-
| N/A | 0.1B | MPNet | [sentence-transformers/all-mpnet-base-v2](https://hf.co/sentence-transformers/all-mpnet-base-v2) |
73+
| MTEB Rank | Model Size | Model Type | Model ID |
74+
|-----------|------------------------|----------------|--------------------------------------------------------------------------------------------------|
75+
| 2 | 7.57B (Very Expensive) | Qwen3 | [Qwen/Qwen3-Embedding-8B](https://hf.co/Qwen/Qwen3-Embedding-8B) |
76+
| 3 | 4.02B (Very Expensive) | Qwen3 | [Qwen/Qwen3-Embedding-4B](https://hf.co/Qwen/Qwen3-Embedding-4B) |
77+
| 4 | 509M | Qwen3 | [Qwen/Qwen3-Embedding-0.6B](https://hf.co/Qwen/Qwen3-Embedding-0.6B) |
78+
| 6 | 7.61B (Very Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-7B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-7B-instruct) |
79+
| 7 | 560M | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) |
80+
| 8 | 308M | Gemma3 | [google/embeddinggemma-300m](https://hf.co/google/embeddinggemma-300m) (gated) |
81+
| 15 | 1.78B (Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) |
82+
| 18 | 7.11B (Very Expensive) | Mistral | [Salesforce/SFR-Embedding-2_R](https://hf.co/Salesforce/SFR-Embedding-2_R) |
83+
| 35 | 568M | XLM-RoBERTa | [Snowflake/snowflake-arctic-embed-l-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-l-v2.0) |
84+
| 41 | 305M | Alibaba GTE | [Snowflake/snowflake-arctic-embed-m-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-m-v2.0) |
85+
| 52 | 335M | BERT | [WhereIsAI/UAE-Large-V1](https://hf.co/WhereIsAI/UAE-Large-V1) |
86+
| 58 | 137M | NomicBERT | [nomic-ai/nomic-embed-text-v1](https://hf.co/nomic-ai/nomic-embed-text-v1) |
87+
| 79 | 137M | NomicBERT | [nomic-ai/nomic-embed-text-v1.5](https://hf.co/nomic-ai/nomic-embed-text-v1.5) |
88+
| 103 | 109M | MPNet | [sentence-transformers/all-mpnet-base-v2](https://hf.co/sentence-transformers/all-mpnet-base-v2) |
89+
| N/A | 475M-A305M | NomicBERT | [nomic-ai/nomic-embed-text-v2-moe](https://hf.co/nomic-ai/nomic-embed-text-v2-moe) |
90+
| N/A | 434M | Alibaba GTE | [Alibaba-NLP/gte-large-en-v1.5](https://hf.co/Alibaba-NLP/gte-large-en-v1.5) |
91+
| N/A | 396M | ModernBERT | [answerdotai/ModernBERT-large](https://hf.co/answerdotai/ModernBERT-large) |
92+
| N/A | 137M | JinaBERT | [jinaai/jina-embeddings-v2-base-en](https://hf.co/jinaai/jina-embeddings-v2-base-en) |
93+
| N/A | 137M | JinaBERT | [jinaai/jina-embeddings-v2-base-code](https://hf.co/jinaai/jina-embeddings-v2-base-code) |
9294

9395
To explore the list of best performing text embeddings models, visit the
9496
[Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
@@ -352,15 +354,15 @@ You have the option to utilize the `HF_TOKEN` environment variable for configuri
352354
For example:
353355

354356
1. Go to https://huggingface.co/settings/tokens
355-
2. Copy your cli READ token
356-
3. Export `HF_TOKEN=<your cli READ token>`
357+
2. Copy your CLI READ token
358+
3. Export `HF_TOKEN=<your CLI READ token>`
357359

358360
or with Docker:
359361

360362
```shell
361363
model=<your private model>
362364
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
363-
token=<your cli READ token>
365+
token=<your CLI READ token>
364366

365367
docker run --gpus all -e HF_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.8 --model-id $model
366368
```

0 commit comments

Comments
 (0)