wandb · dbrian57 · Nov 3, 2025 · Nov 4, 2025 · Nov 4, 2025 · Nov 10, 2025
@@ -4,30 +4,104 @@ description: "Track, test, and improve language model apps with W&B Weave"
 mode: wide
 ---
 
-<Info>
-W&B Inference comes free with your account. Get access to open source models through the API and Weave Playground.
-- [Quickstart](/weave/quickstart-inference)
-- [Product page](https://wandb.ai/site/inference)
-</Info>
+W&B Weave is a powerful observability and evaluation platform that helps you track, evaluate, and improve your LLM application's performance. Weave has the ability to:
 
-W&B Weave helps you build better language model apps. Use Weave to track, test, and improve your apps:
+* [Trace](/weave/quickstart) your application's LLM calls, capturing inputs, outputs, costs, and latency
+* [Evaluate](/weave/guides/core-types/evaluations) and [monitor](/weave/guides/evaluation/guardrails_and_monitors) your application's responses using scorers and LLM judges
+* [Log versions](/weave/tutorial-weave_models) of your application's code, prompts, datasets, and other attributes
+* [Create leaderboards](/weave/guides/core-types/leaderboards) to track and compare your application's performance over time
+* [Integrate Weave into your W&B reinforcement-learning training runs](/weave/guides/tools/weave-in-workspaces) to gain observability into how your models perform during training
 
-- **Track & Watch**: See how your language model calls work in live systems.
-- **Test Changes**: Try new prompts, data, and models safely.
-- **Run Tests**: Test models and prompts in the Playground.
-- **Check Performance**: Use response evaluation tools to track and measure how well your LLM app performs.
-- **Add Safety**: Protect your app with content filters and prompt guards.
+Weave works with many [popular frameworks](/weave/guides/integrations) and has both [Python](/weave/reference/python-sdk) and [TypeScript SDKs](/weave/reference/typescript-sdk).
 
-Connect Weave to your code with:
-- [Python SDK](/weave/reference/python-sdk)
-- [TypeScript SDK](/weave/reference/typescript-sdk)
-- [Service API](/weave/reference/service-api)
+## Get Started
 
-Weave works with many language model providers, local models, and tools.
+See the following quickstart docs to install and learn how integrate Weave into your code:
 
-## Get started
+* [Track LLM inputs and outputs](/weave/quickstart)
+* [Learn Weave with W&B inference](/weave/quickstart-inference)
 
-New to Weave? Start with the [Python quickstart](/weave/quickstart) or TypeScript quickstart.
+You can also review the following Python example to get a quick understanding of how Weave is implemented into code:
+
+<Accordion title="Send requests to OpenAI and evaluate their responses" >
+
+The following example sends simple math questions to OpenAI and then evaluates the responses for correctness (in parallel) using the built-in `CorrectnessScorer()`:
+
+<a target="_blank" href="https://colab.research.google.com/github/wandb/docs/blob/main/weave/cookbooks/source/intro_page_example.ipynb">
+<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
+</a>
+
+```python lines
+import weave
+from openai import OpenAI
+from weave import Scorer
+import asyncio
+
+# Initialize Weave
+weave.init("parallel-evaluation")
+
+# Create OpenAI client
+client = OpenAI()
+
+# Define your model as a weave.op function
+@weave.op
+def math_model(question: str) -> str:
+    response = client.chat.completions.create(
+        model="gpt-4",
+        messages=[
+            {"role": "user", "content": question}
+        ]
+    )
+    return response.choices[0].message.content
+
+# Create a dataset with questions and expected answers
+dataset = [
+    {"question": "What is 2+2?", "expected": "4"},
+    {"question": "What is 5+3?", "expected": "8"},
+    {"question": "What is 10-7?", "expected": "3"},
+    {"question": "What is 12*3?", "expected": "36"},
+    {"question": "What is 100/4?", "expected": "25"},
+]
+
+# Define a class-based scorer
+class CorrectnessScorer(Scorer):
+    """Scorer that checks if the answer is correct"""
+
+    @weave.op
+    def score(self, question: str, expected: str, output: str) -> dict:
+        """Check if the model output contains the expected answer"""
+        import re
+
+        # Extract numbers from the output
+        numbers = re.findall(r'\d+', output)
+
+        if numbers:
+            answer = numbers[0]
+            correct = answer == expected
+        else:
+            correct = False
+
+        return {
+            "correct": correct,
+            "extracted_answer": numbers[0] if numbers else None,
+            "contains_expected": expected in output
+        }
+
+# Instantiate the scorer
+correctness_scorer = CorrectnessScorer()
+
+# Create an evaluation
+evaluation = weave.Evaluation(
+    dataset=dataset,
+    scorers=[correctness_scorer]
+)
+
+# Run the evaluation - automatically evaluates examples in parallel
+asyncio.run(evaluation.evaluate(math_model))
+```
+
+To use this example, follow the [installation instructions](/weave/quickstart#1-install-w%26b-weave-and-create-an-api-key) in the first step of the quickstart. You also need an [OpenAI API key](https://platform.openai.com/api-keys). 
+</Accordion>
 
 ## Advanced guides
 

@@ -0,0 +1,188 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Parallel Evaluation with W&B Weave\n",
+        "\n",
+        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wandb/docs/blob/main/weave/cookbooks/source/parallel_evaluation_example.ipynb)\n",
+        "\n",
+        "This notebook demonstrates how to use W&B Weave to send math questions to OpenAI and evaluate the responses for correctness in parallel.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Installation\n",
+        "\n",
+        "First, install the required packages:\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "!pip install weave openai -qU\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Setup API Keys\n",
+        "\n",
+        "Add your W&B and OpenAI API keys:\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "from getpass import getpass\n",
+        "\n",
+        "# Set your OpenAI API key\n",
+        "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
+        "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\n",
+        "\n",
+        "# Log in to W&B\n",
+        "import wandb\n",
+        "wandb.login()\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Parallel Evaluation Example\n",
+        "\n",
+        "Run the evaluation example:\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import weave\n",
+        "from openai import OpenAI\n",
+        "from weave import Scorer\n",
+        "import asyncio\n",
+        "\n",
+        "# Initialize Weave\n",
+        "weave.init(\"parallel-evaluation\")\n",
+        "\n",
+        "# Create OpenAI client\n",
+        "client = OpenAI()\n",
+        "\n",
+        "# Define your model as a weave.op function\n",
+        "@weave.op\n",
+        "def math_model(question: str) -> str:\n",
+        "    response = client.chat.completions.create(\n",
+        "        model=\"gpt-4\",\n",
+        "        messages=[\n",
+        "            {\"role\": \"user\", \"content\": question}\n",
+        "        ]\n",
+        "    )\n",
+        "    return response.choices[0].message.content\n",
+        "\n",
+        "# Create a dataset with questions and expected answers\n",
+        "dataset = [\n",
+        "    {\"question\": \"What is 2+2?\", \"expected\": \"4\"},\n",
+        "    {\"question\": \"What is 5+3?\", \"expected\": \"8\"},\n",
+        "    {\"question\": \"What is 10-7?\", \"expected\": \"3\"},\n",
+        "    {\"question\": \"What is 12*3?\", \"expected\": \"36\"},\n",
+        "    {\"question\": \"What is 100/4?\", \"expected\": \"25\"},\n",
+        "]\n",
+        "\n",
+        "# Define a class-based scorer\n",
+        "class CorrectnessScorer(Scorer):\n",
+        "    \"\"\"Scorer that checks if the answer is correct\"\"\"\n",
+        "    \n",
+        "    @weave.op\n",
+        "    def score(self, question: str, expected: str, output: str) -> dict:\n",
+        "        \"\"\"Check if the model output contains the expected answer\"\"\"\n",
+        "        import re\n",
+        "        \n",
+        "        # Extract numbers from the output\n",
+        "        numbers = re.findall(r'\\d+', output)\n",
+        "        \n",
+        "        if numbers:\n",
+        "            answer = numbers[0]\n",
+        "            correct = answer == expected\n",
+        "        else:\n",
+        "            correct = False\n",
+        "        \n",
+        "        return {\n",
+        "            \"correct\": correct,\n",
+        "            \"extracted_answer\": numbers[0] if numbers else None,\n",
+        "            \"contains_expected\": expected in output\n",
+        "        }\n",
+        "\n",
+        "# Instantiate the scorer\n",
+        "correctness_scorer = CorrectnessScorer()\n",
+        "\n",
+        "# Create an evaluation\n",
+        "evaluation = weave.Evaluation(\n",
+        "    dataset=dataset,\n",
+        "    scorers=[correctness_scorer]\n",
+        ")\n",
+        "\n",
+        "# Run the evaluation - automatically evaluates examples in parallel\n",
+        "await evaluation.evaluate(math_model)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Note for Google Colab Users\n",
+        "\n",
+        "If you're running this notebook in Google Colab, you may need to handle async differently. Use this version instead:\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# For Google Colab, use this approach:\n",
+        "import nest_asyncio\n",
+        "nest_asyncio.apply()\n",
+        "\n",
+        "# Then run the evaluation\n",
+        "asyncio.run(evaluation.evaluate(math_model))\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## View Results\n",
+        "\n",
+        "After running the evaluation, you can view the results in the W&B Weave dashboard. The evaluation shows:\n",
+        "\n",
+        "1. **Parallel execution**: All examples are evaluated simultaneously for faster results\n",
+        "2. **Correctness scores**: Each response is scored based on whether it contains the correct answer\n",
+        "3. **Detailed metrics**: Including extracted answers and whether the expected value was found\n",
+        "\n",
+        "Visit your [W&B Weave dashboard](https://wandb.ai/home) to explore the evaluation results in detail.\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}