Skip to content

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Oct 10, 2025

Description

Simplify preprocessing step for text reranker pipeline with Qwen3 models(prev and current ways work both)
Update ReadMe for wwb with reranker/embedding pipeline
Added Use cases in llm_bench ReadMe

Checklist:

  • Tests have been updated or added to cover the new code
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation

@github-actions github-actions bot added category: llm_bench Label for tool/llm_bench folder category: WWB PR changes WWB labels Oct 10, 2025
@sbalandi sbalandi requested a review from as-suvorov October 10, 2025 15:53
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the preprocessing for text reranker models and improves documentation across several tools. The main purpose is to simplify the text processing pipeline for Qwen3 models while maintaining backward compatibility and enhance README documentation.

  • Streamlined preprocessing logic for Qwen3 reranker models by removing conditional architecture handling
  • Added comprehensive documentation for reranker and embedding pipelines in the who_what_benchmark tool
  • Enhanced llm_bench README with detailed use cases and parameter explanations

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tools/who_what_benchmark/whowhatbench/reranking_evaluator.py Simplified Qwen3 model preprocessing by removing conditional causal LM architecture handling
tools/who_what_benchmark/README.md Added documentation for text reranking and text embedding model comparison workflows
tools/llm_bench/task/text_reranker.py Simplified tokenization logic for Qwen3 models, removing conditional preprocessing
tools/llm_bench/benchmark.py Fixed help text typos for command-line arguments
tools/llm_bench/README.md Comprehensive documentation update with detailed use cases and parameter descriptions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

# post/pre processing for qwen models added according to transformers Qwen3-Embedding-0.6B model card:
# https://huggingface.co/Qwen/Qwen3-Reranker-0.6B#transformers-usage
if model.config.model_type == "qwen3":
print("NEW WAY")
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement should be removed from production code.

Suggested change
print("NEW WAY")

Copilot uses AI. Check for mistakes.

# Export model to OpenVINO
optimum-cli export openvino -m BAAI/bge-reranker-v2-m3 bge-reranker-v2-m3 --task text-classification

# Collect the references and save the mappling in the .csv file.
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'mappling' to 'mapping'.

Copilot uses AI. Check for mistakes.

Comment on lines +111 to +114
# Reference images will be stored in the "reference" subfolder under the same path with .csv.
wwb --base-model BAAI/bge-small-en-v1.5 --gt-data embed_test/gt.csv --model-type text-embedding --embeds_pooling mean --embeds_normalize --embeds_padding_side "left" --hf
# Compute the metric
# Target images will be stored in the "target" subfolder under the same path with .csv.
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment mentions 'images' but should refer to 'references' or 'data' since this is for text reranking models, not image processing.

Suggested change
# Reference images will be stored in the "reference" subfolder under the same path with .csv.
wwb --base-model BAAI/bge-small-en-v1.5 --gt-data embed_test/gt.csv --model-type text-embedding --embeds_pooling mean --embeds_normalize --embeds_padding_side "left" --hf
# Compute the metric
# Target images will be stored in the "target" subfolder under the same path with .csv.
# Reference data will be stored in the "reference" subfolder under the same path with .csv.
wwb --base-model BAAI/bge-small-en-v1.5 --gt-data embed_test/gt.csv --model-type text-embedding --embeds_pooling mean --embeds_normalize --embeds_padding_side "left" --hf
# Compute the metric
# Target data will be stored in the "target" subfolder under the same path with .csv.

Copilot uses AI. Check for mistakes.

# Reference images will be stored in the "reference" subfolder under the same path with .csv.
wwb --base-model BAAI/bge-small-en-v1.5 --gt-data embed_test/gt.csv --model-type text-embedding --embeds_pooling mean --embeds_normalize --embeds_padding_side "left" --hf
# Compute the metric
# Target images will be stored in the "target" subfolder under the same path with .csv.
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment mentions 'images' but should refer to 'targets' or 'data' since this is for text reranking models, not image processing.

Copilot uses AI. Check for mistakes.


```sh
# prompt lookup decoding
python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2 --task text_gen --max_ngram_siz 3 --num_assistant_tokens 5
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'max_ngram_siz' to 'max_ngram_size'.

Suggested change
python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2 --task text_gen --max_ngram_siz 3 --num_assistant_tokens 5
python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2 --task text_gen --max_ngram_size 3 --num_assistant_tokens 5

Copilot uses AI. Check for mistakes.

# load speaker embeddings
wget https://huggingface.co/datasets/Xenova/cmu-arctic-xvectors-extracted/resolve/main/cmu_us_awb_arctic-wav-arctic_a0001.bin
# run benchmark.py
python benchmark.py -m models/speecht5_tts/ -p "Hello OpenVINO GenAI" -n 2 --task speech_to_text --speaker_embeddings ./cmu_us_awb_arctic-wav-arctic_a0001.bin
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect task type: should be 'text_to_speech' instead of 'speech_to_text' for a TTS model.

Suggested change
python benchmark.py -m models/speecht5_tts/ -p "Hello OpenVINO GenAI" -n 2 --task speech_to_text --speaker_embeddings ./cmu_us_awb_arctic-wav-arctic_a0001.bin
python benchmark.py -m models/speecht5_tts/ -p "Hello OpenVINO GenAI" -n 2 --task text_to_speech --speaker_embeddings ./cmu_us_awb_arctic-wav-arctic_a0001.bin

Copilot uses AI. Check for mistakes.

# load audio
wget https://storage.openvinotoolkit.org/models_contrib/speech/2021.2/librispeech_s5/how_are_you_doing_today.wav
# run benchmark.py
python benchmark.py -m models/whisper-base/ -p ./how_are_you_doing_today.wav -n 2 --task text_to_speech
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect task type: should be 'speech_to_text' instead of 'text_to_speech' for a Whisper STT model.

Suggested change
python benchmark.py -m models/whisper-base/ -p ./how_are_you_doing_today.wav -n 2 --task text_to_speech
python benchmark.py -m models/whisper-base/ -p ./how_are_you_doing_today.wav -n 2 --task speech_to_text

Copilot uses AI. Check for mistakes.

Comment on lines +323 to +327
## 8. Memory constipation mode
Enables memory usage information collection mode. This mode is affect of execution time, so it is not recommended to run memory consumption and performance benchmarking at the same time. Effect on performance can be reduced by specifying a longer --memory_consumption_delay, but the impact is still expected.

```sh
# run benchmark.py in memory constipation mode
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'constipation' to 'consumption'.

Suggested change
## 8. Memory constipation mode
Enables memory usage information collection mode. This mode is affect of execution time, so it is not recommended to run memory consumption and performance benchmarking at the same time. Effect on performance can be reduced by specifying a longer --memory_consumption_delay, but the impact is still expected.
```sh
# run benchmark.py in memory constipation mode
## 8. Memory consumption mode
Enables memory usage information collection mode. This mode affects execution time, so it is not recommended to run memory consumption and performance benchmarking at the same time. Effect on performance can be reduced by specifying a longer --memory_consumption_delay, but the impact is still expected.
```sh
# run benchmark.py in memory consumption mode

Copilot uses AI. Check for mistakes.

Enables memory usage information collection mode. This mode is affect of execution time, so it is not recommended to run memory consumption and performance benchmarking at the same time. Effect on performance can be reduced by specifying a longer --memory_consumption_delay, but the impact is still expected.

```sh
# run benchmark.py in memory constipation mode
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'constipation' to 'consumption'.

Suggested change
# run benchmark.py in memory constipation mode
# run benchmark.py in memory consumption mode

Copilot uses AI. Check for mistakes.

**Parameters:**
- `-mc, --memory_consumption`: Enables memory usage information collection mode. If the value is 1, output the maximum memory consumption in warm-up iterations. If the value is 2, output the maximum memory consumption in all iterations.
- `--memory_consumption_delay`: Delay for memory consumption check in seconds, smaller value will lead to more precised memory consumption, but may affects performance.
- `-mc_dir, --memory_consumption_dir`: Path to store memory consamption logs and chart.
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'consamption' to 'consumption'.

Suggested change
- `-mc_dir, --memory_consumption_dir`: Path to store memory consamption logs and chart.
- `-mc_dir, --memory_consumption_dir`: Path to store memory consumption logs and chart.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: llm_bench Label for tool/llm_bench folder category: WWB PR changes WWB

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant