[tools] ReadME update and fixes #2819

sbalandi · 2025-10-10T15:52:06Z

Description

Simplify preprocessing step for text reranker pipeline with Qwen3 models(prev and current ways work both)
Update ReadMe for wwb with reranker/embedding pipeline
Added Use cases in llm_bench ReadMe

Checklist:

Tests have been updated or added to cover the new code
This patch fully addresses the ticket.
I have made corresponding changes to the documentation

Copilot

Pull Request Overview

This PR updates the preprocessing for text reranker models and improves documentation across several tools. The main purpose is to simplify the text processing pipeline for Qwen3 models while maintaining backward compatibility and enhance README documentation.

Streamlined preprocessing logic for Qwen3 reranker models by removing conditional architecture handling
Added comprehensive documentation for reranker and embedding pipelines in the who_what_benchmark tool
Enhanced llm_bench README with detailed use cases and parameter explanations

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
tools/who_what_benchmark/whowhatbench/reranking_evaluator.py	Simplified Qwen3 model preprocessing by removing conditional causal LM architecture handling
tools/who_what_benchmark/README.md	Added documentation for text reranking and text embedding model comparison workflows
tools/llm_bench/task/text_reranker.py	Simplified tokenization logic for Qwen3 models, removing conditional preprocessing
tools/llm_bench/benchmark.py	Fixed help text typos for command-line arguments
tools/llm_bench/README.md	Comprehensive documentation update with detailed use cases and parameter descriptions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-10T18:03:29Z

tools/who_what_benchmark/whowhatbench/reranking_evaluator.py

            # post/pre processing for qwen models added according to transformers Qwen3-Embedding-0.6B model card:
            # https://huggingface.co/Qwen/Qwen3-Reranker-0.6B#transformers-usage
            if model.config.model_type == "qwen3":
+                print("NEW WAY")


Debug print statement should be removed from production code.

Suggested change

print("NEW WAY")

Copilot · 2025-10-10T18:03:29Z

tools/who_what_benchmark/README.md

+# Export model to OpenVINO
+optimum-cli export openvino -m BAAI/bge-reranker-v2-m3 bge-reranker-v2-m3 --task text-classification
+
+# Collect the references and save the mappling in the .csv file.


Corrected spelling of 'mappling' to 'mapping'.

Copilot · 2025-10-10T18:03:29Z

tools/who_what_benchmark/README.md

+# Reference images will be stored in the "reference" subfolder under the same path with .csv.
+wwb --base-model BAAI/bge-small-en-v1.5 --gt-data embed_test/gt.csv --model-type text-embedding --embeds_pooling mean --embeds_normalize --embeds_padding_side "left" --hf
+# Compute the metric
+# Target images will be stored in the "target" subfolder under the same path with .csv.


This comment mentions 'images' but should refer to 'references' or 'data' since this is for text reranking models, not image processing.

Suggested change

# Reference images will be stored in the "reference" subfolder under the same path with .csv.

wwb --base-model BAAI/bge-small-en-v1.5 --gt-data embed_test/gt.csv --model-type text-embedding --embeds_pooling mean --embeds_normalize --embeds_padding_side "left" --hf

# Compute the metric

# Target images will be stored in the "target" subfolder under the same path with .csv.

# Reference data will be stored in the "reference" subfolder under the same path with .csv.

wwb --base-model BAAI/bge-small-en-v1.5 --gt-data embed_test/gt.csv --model-type text-embedding --embeds_pooling mean --embeds_normalize --embeds_padding_side "left" --hf

# Compute the metric

# Target data will be stored in the "target" subfolder under the same path with .csv.

Copilot · 2025-10-10T18:03:30Z

tools/who_what_benchmark/README.md

+# Reference images will be stored in the "reference" subfolder under the same path with .csv.
+wwb --base-model BAAI/bge-small-en-v1.5 --gt-data embed_test/gt.csv --model-type text-embedding --embeds_pooling mean --embeds_normalize --embeds_padding_side "left" --hf
+# Compute the metric
+# Target images will be stored in the "target" subfolder under the same path with .csv.


This comment mentions 'images' but should refer to 'targets' or 'data' since this is for text reranking models, not image processing.

Copilot · 2025-10-10T18:03:30Z

tools/llm_bench/README.md

+
+```sh
+# prompt lookup decoding
+python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2 --task text_gen --max_ngram_siz 3 --num_assistant_tokens 5


Corrected spelling of 'max_ngram_siz' to 'max_ngram_size'.

Suggested change

python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2 --task text_gen --max_ngram_siz 3 --num_assistant_tokens 5

python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2 --task text_gen --max_ngram_size 3 --num_assistant_tokens 5

Copilot · 2025-10-10T18:03:30Z

tools/llm_bench/README.md

+# load speaker embeddings
+wget https://huggingface.co/datasets/Xenova/cmu-arctic-xvectors-extracted/resolve/main/cmu_us_awb_arctic-wav-arctic_a0001.bin
+# run benchmark.py
+python benchmark.py -m models/speecht5_tts/ -p "Hello OpenVINO GenAI" -n 2 --task speech_to_text --speaker_embeddings ./cmu_us_awb_arctic-wav-arctic_a0001.bin


Incorrect task type: should be 'text_to_speech' instead of 'speech_to_text' for a TTS model.

Suggested change

python benchmark.py -m models/speecht5_tts/ -p "Hello OpenVINO GenAI" -n 2 --task speech_to_text --speaker_embeddings ./cmu_us_awb_arctic-wav-arctic_a0001.bin

python benchmark.py -m models/speecht5_tts/ -p "Hello OpenVINO GenAI" -n 2 --task text_to_speech --speaker_embeddings ./cmu_us_awb_arctic-wav-arctic_a0001.bin

Copilot · 2025-10-10T18:03:31Z

tools/llm_bench/README.md

+# load audio
+wget https://storage.openvinotoolkit.org/models_contrib/speech/2021.2/librispeech_s5/how_are_you_doing_today.wav
+# run benchmark.py
+python benchmark.py -m models/whisper-base/ -p ./how_are_you_doing_today.wav -n 2 --task text_to_speech 


Incorrect task type: should be 'speech_to_text' instead of 'text_to_speech' for a Whisper STT model.

Suggested change

python benchmark.py -m models/whisper-base/ -p ./how_are_you_doing_today.wav -n 2 --task text_to_speech

python benchmark.py -m models/whisper-base/ -p ./how_are_you_doing_today.wav -n 2 --task speech_to_text

Copilot · 2025-10-10T18:03:31Z

tools/llm_bench/README.md

+## 8. Memory constipation mode
+Enables memory usage information collection mode. This mode is affect of execution time, so it is not recommended to run memory consumption and performance benchmarking at the same time. Effect on performance can be reduced by specifying a longer --memory_consumption_delay, but the impact is still expected.
+
+```sh
+# run benchmark.py in memory constipation mode


Corrected spelling of 'constipation' to 'consumption'.

Suggested change

## 8. Memory constipation mode

Enables memory usage information collection mode. This mode is affect of execution time, so it is not recommended to run memory consumption and performance benchmarking at the same time. Effect on performance can be reduced by specifying a longer --memory_consumption_delay, but the impact is still expected.

```sh

# run benchmark.py in memory constipation mode

## 8. Memory consumption mode

Enables memory usage information collection mode. This mode affects execution time, so it is not recommended to run memory consumption and performance benchmarking at the same time. Effect on performance can be reduced by specifying a longer --memory_consumption_delay, but the impact is still expected.

```sh

# run benchmark.py in memory consumption mode

Copilot · 2025-10-10T18:03:31Z

tools/llm_bench/README.md

+Enables memory usage information collection mode. This mode is affect of execution time, so it is not recommended to run memory consumption and performance benchmarking at the same time. Effect on performance can be reduced by specifying a longer --memory_consumption_delay, but the impact is still expected.
+
+```sh
+# run benchmark.py in memory constipation mode


Corrected spelling of 'constipation' to 'consumption'.

Suggested change

# run benchmark.py in memory constipation mode

# run benchmark.py in memory consumption mode

Copilot · 2025-10-10T18:03:31Z

tools/llm_bench/README.md

+**Parameters:**
+- `-mc, --memory_consumption`: Enables memory usage information collection mode. If the value is 1, output the maximum memory consumption in warm-up iterations. If the value is 2, output the maximum memory consumption in all iterations.
+- `--memory_consumption_delay`: Delay for memory consumption check in seconds, smaller value will lead to more precised memory consumption, but may affects performance.
+- `-mc_dir, --memory_consumption_dir`: Path to store memory consamption logs and chart.


Corrected spelling of 'consamption' to 'consumption'.

Suggested change

- `-mc_dir, --memory_consumption_dir`: Path to store memory consamption logs and chart.

- `-mc_dir, --memory_consumption_dir`: Path to store memory consumption logs and chart.

github-actions bot added category: llm_bench Label for tool/llm_bench folder category: WWB PR changes WWB labels Oct 10, 2025

sbalandi requested a review from as-suvorov October 10, 2025 15:53

[tools] ReadME update and fixes

061d54a

sbalandi force-pushed the read_me_fixes branch from f7107e9 to 061d54a Compare October 10, 2025 15:54

as-suvorov requested a review from Copilot October 10, 2025 18:02

Copilot AI reviewed Oct 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tools] ReadME update and fixes #2819

[tools] ReadME update and fixes #2819

sbalandi commented Oct 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Copilot AI Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2 --task text_gen --max_ngram_siz 3 --num_assistant_tokens 5
	python benchmark.py -m models/llama-2-7b-chat/ -p "What is openvino?" -n 2 --task text_gen --max_ngram_size 3 --num_assistant_tokens 5

	python benchmark.py -m models/speecht5_tts/ -p "Hello OpenVINO GenAI" -n 2 --task speech_to_text --speaker_embeddings ./cmu_us_awb_arctic-wav-arctic_a0001.bin
	python benchmark.py -m models/speecht5_tts/ -p "Hello OpenVINO GenAI" -n 2 --task text_to_speech --speaker_embeddings ./cmu_us_awb_arctic-wav-arctic_a0001.bin

	python benchmark.py -m models/whisper-base/ -p ./how_are_you_doing_today.wav -n 2 --task text_to_speech
	python benchmark.py -m models/whisper-base/ -p ./how_are_you_doing_today.wav -n 2 --task speech_to_text

	# run benchmark.py in memory constipation mode
	# run benchmark.py in memory consumption mode

	- `-mc_dir, --memory_consumption_dir`: Path to store memory consamption logs and chart.
	- `-mc_dir, --memory_consumption_dir`: Path to store memory consumption logs and chart.

[tools] ReadME update and fixes #2819

Are you sure you want to change the base?

[tools] ReadME update and fixes #2819

Conversation

sbalandi commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sbalandi commented Oct 10, 2025 •

edited

Loading