Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
308 changes: 264 additions & 44 deletions samples/js/package-lock.json

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion samples/js/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
"license": "Apache-2.0",
"type": "module",
"devDependencies": {
"openvino-genai-node": "^2025.4.0"
"openvino-genai-node": "^2025.4.0",
"yargs": "^18.0.0"
},
"engines": {
"node": ">=21.0.0"
Expand Down
31 changes: 29 additions & 2 deletions samples/js/text_generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,16 @@ and architectures, we still recommend converting the model to the IR format usin
## Sample Descriptions
### Common information

Compile GenAI JavaScript bindings archive first using the instructions in [../../../src/js/README.md](../../../src/js/README.md#build-bindings).
When you use the [openvino.genai](https://github.com/openvinotoolkit/openvino.genai) **release branch**, install dependencies before running samples.
In the current directory, run:
```bash
npm install
```

If you use the master branch, you may have to follow
[this instruction](../../../src/js/README.md#build-bindings)
to build the latest version of `openvino-genai-node` from source first, then install dependencies.

Run `npm install` and the examples will be ready to run.

Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is recommended to run larger models on a dGPU with 32GB+ RAM. For example, the model meta-llama/Llama-2-13b-chat-hf can benefit from being run on a dGPU. Modify the source code to change the device for inference to the GPU.

Expand Down Expand Up @@ -92,6 +99,26 @@ Recommended models: Qwen/Qwen2.5-3B-Instruct, Qwen/Qwen2.5-7B-Instruct
node react_sample.js model_dir
```

### 6. LLMs benchmarking sample (`benchmark_genai`)
- **Description:**
This sample script demonstrates how to benchmark LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.

For more information on how performance metrics are calculated, please follow the [performance-metrics tutorial](../../../src/README.md#performance-metrics).
- **Main Feature:** Benchmark model via GenAI
- **Run Command:**
```bash
node benchmark_genai.js [-m MODEL] [-p PROMPT] [--nw NUM_WARMUP] [-n NUM_ITER] [--mt MAX_NEW_TOKENS] [-d DEVICE]
```

#### Options
- `-m`, `--model`: Path to model and tokenizers base directory. [string] [required]
- `-p`, `--prompt`: The prompt to generate text. If without `-p` and `--pf`, the default prompt is `The Sky is blue because`. [string]
- `--prompt_file`, `--pf`: Read prompt from file. [string]
- `--num_warmup`, `--nw`: Number of warmup iterations. [number] [default: 1]
- `-n`, `--num_iter`: Number of iterations. [number] [default: 2]
- `--max_new_tokens`, `--mt`: Maximal number of new tokens. [number] [default: 20]
- `-d`, `--device`: Device to run the model on. [string] [default: "CPU"]

### Troubleshooting

#### Unicode characters encoding error on Windows
Expand Down
111 changes: 111 additions & 0 deletions samples/js/text_generation/benchmark_genai.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
// Copyright (C) 2025 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

import { LLMPipeline } from "openvino-genai-node";
import yargs from "yargs/yargs";
import { hideBin } from "yargs/helpers";
import { readFileSync } from "fs";

main();

async function main() {
const argv = yargs(hideBin(process.argv))
.option("model", {
alias: "m",
type: "string",
demandOption: true,
describe: "Path to model and tokenizers base directory.",
})
.option("prompt", {
alias: "p",
type: "string",
describe:
"The prompt to generate text. If without `-p` and `--pf`, the default prompt is `The Sky is blue because`.",
})
.option("prompt_file", {
alias: "pf",
type: "string",
describe: "Read prompt from file.",
})
.option("num_warmup", {
alias: "nw",
type: "number",
default: 1,
describe: "Number of warmup iterations.",
})
.option("num_iter", {
alias: "n",
type: "number",
default: 2,
describe: "Number of iterations.",
})
.option("max_new_tokens", {
alias: "mt",
type: "number",
default: 20,
describe: "Maximal number of new tokens.",
})
.option("device", {
alias: "d",
type: "string",
default: "CPU",
describe: "Device.",
})
.parse();

let prompt;
if (argv.prompt !== undefined && argv.prompt_file !== undefined) {
console.error(`Prompt and prompt file should not exist together!`);
process.exit(1);
} else {
if (argv.prompt_file !== undefined) {
prompt = [readFileSync(argv.prompt_file, "utf-8")];
} else {
prompt = argv.prompt === undefined ? ["The Sky is blue because"] : [argv.prompt];
}
}
if (prompt.length === 0 || prompt[0].trim() === "") {
throw new Error("Prompt is empty!");
}

const modelsPath = argv.model;
const { device } = argv;
const numWarmup = argv.num_warmup;
const numIter = argv.num_iter;

const config = {
max_new_tokens: argv.max_new_tokens,
return_decoded_results: true,
};

let pipe;
if (device === "NPU") {
pipe = await LLMPipeline(modelsPath, device);
} else {
const schedulerConfig = {
enable_prefix_caching: false,
max_num_batched_tokens: Number.MAX_SAFE_INTEGER,
};
pipe = await LLMPipeline(modelsPath, device, { schedulerConfig: schedulerConfig });
}

for (let i = 0; i < numWarmup; i++) {
await pipe.generate(prompt, config);
}

let res = await pipe.generate(prompt, config);
let { perfMetrics } = res;
for (let i = 0; i < numIter - 1; i++) {
res = await pipe.generate(prompt, config);
perfMetrics.add(res.perfMetrics);
}

console.log(`Output token size: ${perfMetrics.getNumGeneratedTokens()}`);
console.log(`Load time: ${perfMetrics.getLoadTime()} ms`);
console.log(`Generate time: ${perfMetrics.getGenerateDuration().mean} ± ${perfMetrics.getGenerateDuration().std} ms`);
console.log(`Tokenization time: ${perfMetrics.getTokenizationDuration().mean} ± ${perfMetrics.getTokenizationDuration().std} ms`);
console.log(`Detokenization time: ${perfMetrics.getDetokenizationDuration().mean} ± ${perfMetrics.getDetokenizationDuration().std} ms`);
console.log(`TTFT: ${perfMetrics.getTTFT().mean} ± ${perfMetrics.getTTFT().std} ms`);
console.log(`TPOT: ${perfMetrics.getTPOT().mean} ± ${perfMetrics.getTPOT().std} ms`);
console.log(`Throughput : ${perfMetrics.getThroughput().mean} ± ${perfMetrics.getThroughput().std} tokens/s`);
}
4 changes: 2 additions & 2 deletions samples/python/text_generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,9 @@ LLMPipeline and Tokenizer objects can be initialized directly from the memory bu

### 9. LLMs benchmarking sample (`benchmark_genai`)
- **Description:**
This sample script demonstrates how to benchmark an LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.
This sample script demonstrates how to benchmark LLMs in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics.

For more information how performance metrics are calculated please follow [performance-metrics tutorial](../../../src/README.md#performance-metrics).
For more information how performance metrics are calculated, please follow the [performance-metrics tutorial](../../../src/README.md#performance-metrics).
- **Main Feature:** Benchmark model via GenAI
- **Run Command:**
```bash
Expand Down
11 changes: 11 additions & 0 deletions src/js/include/helper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,17 @@ ov::genai::ChatHistory js_to_cpp<ov::genai::ChatHistory>(const Napi::Env& env, c
template <>
ov::genai::SchedulerConfig js_to_cpp<ov::genai::SchedulerConfig>(const Napi::Env& env, const Napi::Value& value);

/**
* @brief Unwraps a C++ object from a JavaScript wrapper.
* @tparam TargetType The C++ class type to extract.
* @return Reference to the unwrapped C++ object.
*/
template <typename TargetType>
TargetType& unwrap(const Napi::Env& env, const Napi::Value& value);

template <>
ov::genai::PerfMetrics& unwrap<ov::genai::PerfMetrics>(const Napi::Env& env, const Napi::Value& value);

/**
* @brief Template function to convert C++ data types into Javascript data types
* @tparam TargetType Destinated Javascript data type.
Expand Down
2 changes: 2 additions & 0 deletions src/js/include/perf_metrics.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ class PerfMetricsWrapper : public Napi::ObjectWrap<PerfMetricsWrapper> {
Napi::Value get_grammar_compile_time(const Napi::CallbackInfo& info);

Napi::Value get_raw_metrics(const Napi::CallbackInfo& info);
Napi::Value add(const Napi::CallbackInfo& info);
ov::genai::PerfMetrics& get_value();

private:
ov::genai::PerfMetrics _metrics;
Expand Down
5 changes: 5 additions & 0 deletions src/js/lib/pipelines/llmPipeline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,11 @@ export interface PerfMetrics {
getGrammarCompileTime(): SummaryStats;
/** A structure of RawPerfMetrics type that holds raw metrics. */
rawMetrics: RawMetrics;

/** Adds the metrics from another PerfMetrics object to this one.
* @returns The current PerfMetrics instance.
*/
add(other: PerfMetrics): this;
}

export class DecodedResults {
Expand Down
16 changes: 16 additions & 0 deletions src/js/src/helper.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#include "include/helper.hpp"

#include "include/addon.hpp"
#include "include/perf_metrics.hpp"

namespace {
constexpr const char* JS_SCHEDULER_CONFIG_KEY = "schedulerConfig";
constexpr const char* CPP_SCHEDULER_CONFIG_KEY = "scheduler_config";
Expand Down Expand Up @@ -186,6 +189,19 @@ ov::genai::SchedulerConfig js_to_cpp<ov::genai::SchedulerConfig>(const Napi::Env
return config;
}

template <>
ov::genai::PerfMetrics& unwrap<ov::genai::PerfMetrics>(const Napi::Env& env, const Napi::Value& value) {
const auto obj = value.As<Napi::Object>();
const auto& prototype = env.GetInstanceData<AddonData>()->perf_metrics;

OPENVINO_ASSERT(prototype, "Invalid pointer to prototype.");
OPENVINO_ASSERT(obj.InstanceOf(prototype.Value().As<Napi::Function>()),
"Passed argument is not of type PerfMetrics");

const auto js_metrics = Napi::ObjectWrap<PerfMetricsWrapper>::Unwrap(obj);
return js_metrics->get_value();
}

template <>
Napi::Value cpp_to_js<ov::genai::EmbeddingResult, Napi::Value>(const Napi::Env& env,
const ov::genai::EmbeddingResult embedding_result) {
Expand Down
16 changes: 16 additions & 0 deletions src/js/src/perf_metrics.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Napi::Function PerfMetricsWrapper::get_class(Napi::Env env) {
InstanceMethod("getGrammarCompilerInitTimes", &PerfMetricsWrapper::get_grammar_compiler_init_times),
InstanceMethod("getGrammarCompileTime", &PerfMetricsWrapper::get_grammar_compile_time),
InstanceAccessor<&PerfMetricsWrapper::get_raw_metrics>("rawMetrics"),
InstanceMethod("add", &PerfMetricsWrapper::add),
});
}

Expand Down Expand Up @@ -167,3 +168,18 @@ Napi::Value PerfMetricsWrapper::get_raw_metrics(const Napi::CallbackInfo& info)

return obj;
}

Napi::Value PerfMetricsWrapper::add(const Napi::CallbackInfo& info) {
VALIDATE_ARGS_COUNT(info, 1, "add()");
const auto env = info.Env();
try {
_metrics += unwrap<ov::genai::PerfMetrics>(env, info[0]);
} catch (const std::exception& ex) {
Napi::TypeError::New(env, ex.what()).ThrowAsJavaScriptException();
}
return info.This();
}

ov::genai::PerfMetrics& PerfMetricsWrapper::get_value() {
return _metrics;
}
22 changes: 22 additions & 0 deletions src/js/tests/module.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,28 @@ describe("LLMPipeline.generate()", () => {
assert.ok(perfMetrics.rawMetrics.inferenceDurations.length > 0);
assert.ok(perfMetrics.rawMetrics.grammarCompileTimes.length === 0);
});

it("test perfMetrics.add()", async () => {
const config = {
max_new_tokens: 5,
return_decoded_results: true,
};
const res1 = await pipeline.generate("prompt1", config);
const res2 = await pipeline.generate("prompt2", config);

const perfMetrics1 = res1.perfMetrics;
const perfMetrics2 = res2.perfMetrics;

const totalNumGeneratedTokens =
perfMetrics1.getNumGeneratedTokens() + perfMetrics2.getNumGeneratedTokens();

perfMetrics1.add(perfMetrics2);
assert.strictEqual(perfMetrics1.getNumGeneratedTokens(), totalNumGeneratedTokens);

assert.throws(() => perfMetrics1.add({}), {
message: /Passed argument is not of type PerfMetrics/,
});
});
});

describe("stream()", () => {
Expand Down
Loading