Skip to content

Commit 5cf277b

Browse files
authored
Weave intro page rewrite (#1819)
## Description This rewrites to the W&B Weave intro page to: * Improve clarity * Remove the Inference banner from the top of the page * Get users to the quickstart info faster * Make the page shorter and more concise
1 parent 9f8aee4 commit 5cf277b

File tree

2 files changed

+280
-18
lines changed

2 files changed

+280
-18
lines changed

weave.mdx

Lines changed: 92 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,30 +4,104 @@ description: "Track, test, and improve language model apps with W&B Weave"
44
mode: wide
55
---
66

7-
<Info>
8-
W&B Inference comes free with your account. Get access to open source models through the API and Weave Playground.
9-
- [Quickstart](/weave/quickstart-inference)
10-
- [Product page](https://wandb.ai/site/inference)
11-
</Info>
7+
W&B Weave is a powerful observability and evaluation platform that helps you track, evaluate, and improve your LLM application's performance. Weave has the ability to:
128

13-
W&B Weave helps you build better language model apps. Use Weave to track, test, and improve your apps:
9+
* [Trace](/weave/quickstart) your application's LLM calls, capturing inputs, outputs, costs, and latency
10+
* [Evaluate](/weave/guides/core-types/evaluations) and [monitor](/weave/guides/evaluation/guardrails_and_monitors) your application's responses using scorers and LLM judges
11+
* [Log versions](/weave/tutorial-weave_models) of your application's code, prompts, datasets, and other attributes
12+
* [Create leaderboards](/weave/guides/core-types/leaderboards) to track and compare your application's performance over time
13+
* [Integrate Weave into your W&B reinforcement-learning training runs](/weave/guides/tools/weave-in-workspaces) to gain observability into how your models perform during training
1414

15-
- **Track & Watch**: See how your language model calls work in live systems.
16-
- **Test Changes**: Try new prompts, data, and models safely.
17-
- **Run Tests**: Test models and prompts in the Playground.
18-
- **Check Performance**: Use response evaluation tools to track and measure how well your LLM app performs.
19-
- **Add Safety**: Protect your app with content filters and prompt guards.
15+
Weave works with many [popular frameworks](/weave/guides/integrations) and has both [Python](/weave/reference/python-sdk) and [TypeScript SDKs](/weave/reference/typescript-sdk).
2016

21-
Connect Weave to your code with:
22-
- [Python SDK](/weave/reference/python-sdk)
23-
- [TypeScript SDK](/weave/reference/typescript-sdk)
24-
- [Service API](/weave/reference/service-api)
17+
## Get Started
2518

26-
Weave works with many language model providers, local models, and tools.
19+
See the following quickstart docs to install and learn how integrate Weave into your code:
2720

28-
## Get started
21+
* [Track LLM inputs and outputs](/weave/quickstart)
22+
* [Learn Weave with W&B inference](/weave/quickstart-inference)
2923

30-
New to Weave? Start with the [Python quickstart](/weave/quickstart) or TypeScript quickstart.
24+
You can also review the following Python example to get a quick understanding of how Weave is implemented into code:
25+
26+
<Accordion title="Send requests to OpenAI and evaluate their responses" >
27+
28+
The following example sends simple math questions to OpenAI and then evaluates the responses for correctness (in parallel) using the built-in `CorrectnessScorer()`:
29+
30+
<a target="_blank" href="https://colab.research.google.com/github/wandb/docs/blob/main/weave/cookbooks/source/intro_page_example.ipynb">
31+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
32+
</a>
33+
34+
```python lines
35+
import weave
36+
from openai import OpenAI
37+
from weave import Scorer
38+
import asyncio
39+
40+
# Initialize Weave
41+
weave.init("parallel-evaluation")
42+
43+
# Create OpenAI client
44+
client = OpenAI()
45+
46+
# Define your model as a weave.op function
47+
@weave.op
48+
def math_model(question: str) -> str:
49+
response = client.chat.completions.create(
50+
model="gpt-4",
51+
messages=[
52+
{"role": "user", "content": question}
53+
]
54+
)
55+
return response.choices[0].message.content
56+
57+
# Create a dataset with questions and expected answers
58+
dataset = [
59+
{"question": "What is 2+2?", "expected": "4"},
60+
{"question": "What is 5+3?", "expected": "8"},
61+
{"question": "What is 10-7?", "expected": "3"},
62+
{"question": "What is 12*3?", "expected": "36"},
63+
{"question": "What is 100/4?", "expected": "25"},
64+
]
65+
66+
# Define a class-based scorer
67+
class CorrectnessScorer(Scorer):
68+
"""Scorer that checks if the answer is correct"""
69+
70+
@weave.op
71+
def score(self, question: str, expected: str, output: str) -> dict:
72+
"""Check if the model output contains the expected answer"""
73+
import re
74+
75+
# Extract numbers from the output
76+
numbers = re.findall(r'\d+', output)
77+
78+
if numbers:
79+
answer = numbers[0]
80+
correct = answer == expected
81+
else:
82+
correct = False
83+
84+
return {
85+
"correct": correct,
86+
"extracted_answer": numbers[0] if numbers else None,
87+
"contains_expected": expected in output
88+
}
89+
90+
# Instantiate the scorer
91+
correctness_scorer = CorrectnessScorer()
92+
93+
# Create an evaluation
94+
evaluation = weave.Evaluation(
95+
dataset=dataset,
96+
scorers=[correctness_scorer]
97+
)
98+
99+
# Run the evaluation - automatically evaluates examples in parallel
100+
asyncio.run(evaluation.evaluate(math_model))
101+
```
102+
103+
To use this example, follow the [installation instructions](/weave/quickstart#1-install-w%26b-weave-and-create-an-api-key) in the first step of the quickstart. You also need an [OpenAI API key](https://platform.openai.com/api-keys).
104+
</Accordion>
31105

32106
## Advanced guides
33107

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Parallel Evaluation with W&B Weave\n",
8+
"\n",
9+
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wandb/docs/blob/main/weave/cookbooks/source/parallel_evaluation_example.ipynb)\n",
10+
"\n",
11+
"This notebook demonstrates how to use W&B Weave to send math questions to OpenAI and evaluate the responses for correctness in parallel.\n"
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"## Installation\n",
19+
"\n",
20+
"First, install the required packages:\n"
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": null,
26+
"metadata": {},
27+
"outputs": [],
28+
"source": [
29+
"!pip install weave openai -qU\n"
30+
]
31+
},
32+
{
33+
"cell_type": "markdown",
34+
"metadata": {},
35+
"source": [
36+
"## Setup API Keys\n",
37+
"\n",
38+
"Add your W&B and OpenAI API keys:\n"
39+
]
40+
},
41+
{
42+
"cell_type": "code",
43+
"execution_count": null,
44+
"metadata": {},
45+
"outputs": [],
46+
"source": [
47+
"import os\n",
48+
"from getpass import getpass\n",
49+
"\n",
50+
"# Set your OpenAI API key\n",
51+
"if not os.environ.get(\"OPENAI_API_KEY\"):\n",
52+
" os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\n",
53+
"\n",
54+
"# Log in to W&B\n",
55+
"import wandb\n",
56+
"wandb.login()\n"
57+
]
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"metadata": {},
62+
"source": [
63+
"## Parallel Evaluation Example\n",
64+
"\n",
65+
"Run the evaluation example:\n"
66+
]
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": null,
71+
"metadata": {},
72+
"outputs": [],
73+
"source": [
74+
"import weave\n",
75+
"from openai import OpenAI\n",
76+
"from weave import Scorer\n",
77+
"import asyncio\n",
78+
"\n",
79+
"# Initialize Weave\n",
80+
"weave.init(\"parallel-evaluation\")\n",
81+
"\n",
82+
"# Create OpenAI client\n",
83+
"client = OpenAI()\n",
84+
"\n",
85+
"# Define your model as a weave.op function\n",
86+
"@weave.op\n",
87+
"def math_model(question: str) -> str:\n",
88+
" response = client.chat.completions.create(\n",
89+
" model=\"gpt-4\",\n",
90+
" messages=[\n",
91+
" {\"role\": \"user\", \"content\": question}\n",
92+
" ]\n",
93+
" )\n",
94+
" return response.choices[0].message.content\n",
95+
"\n",
96+
"# Create a dataset with questions and expected answers\n",
97+
"dataset = [\n",
98+
" {\"question\": \"What is 2+2?\", \"expected\": \"4\"},\n",
99+
" {\"question\": \"What is 5+3?\", \"expected\": \"8\"},\n",
100+
" {\"question\": \"What is 10-7?\", \"expected\": \"3\"},\n",
101+
" {\"question\": \"What is 12*3?\", \"expected\": \"36\"},\n",
102+
" {\"question\": \"What is 100/4?\", \"expected\": \"25\"},\n",
103+
"]\n",
104+
"\n",
105+
"# Define a class-based scorer\n",
106+
"class CorrectnessScorer(Scorer):\n",
107+
" \"\"\"Scorer that checks if the answer is correct\"\"\"\n",
108+
" \n",
109+
" @weave.op\n",
110+
" def score(self, question: str, expected: str, output: str) -> dict:\n",
111+
" \"\"\"Check if the model output contains the expected answer\"\"\"\n",
112+
" import re\n",
113+
" \n",
114+
" # Extract numbers from the output\n",
115+
" numbers = re.findall(r'\\d+', output)\n",
116+
" \n",
117+
" if numbers:\n",
118+
" answer = numbers[0]\n",
119+
" correct = answer == expected\n",
120+
" else:\n",
121+
" correct = False\n",
122+
" \n",
123+
" return {\n",
124+
" \"correct\": correct,\n",
125+
" \"extracted_answer\": numbers[0] if numbers else None,\n",
126+
" \"contains_expected\": expected in output\n",
127+
" }\n",
128+
"\n",
129+
"# Instantiate the scorer\n",
130+
"correctness_scorer = CorrectnessScorer()\n",
131+
"\n",
132+
"# Create an evaluation\n",
133+
"evaluation = weave.Evaluation(\n",
134+
" dataset=dataset,\n",
135+
" scorers=[correctness_scorer]\n",
136+
")\n",
137+
"\n",
138+
"# Run the evaluation - automatically evaluates examples in parallel\n",
139+
"await evaluation.evaluate(math_model)\n"
140+
]
141+
},
142+
{
143+
"cell_type": "markdown",
144+
"metadata": {},
145+
"source": [
146+
"## Note for Google Colab Users\n",
147+
"\n",
148+
"If you're running this notebook in Google Colab, you may need to handle async differently. Use this version instead:\n"
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": null,
154+
"metadata": {},
155+
"outputs": [],
156+
"source": [
157+
"# For Google Colab, use this approach:\n",
158+
"import nest_asyncio\n",
159+
"nest_asyncio.apply()\n",
160+
"\n",
161+
"# Then run the evaluation\n",
162+
"asyncio.run(evaluation.evaluate(math_model))\n"
163+
]
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"metadata": {},
168+
"source": [
169+
"## View Results\n",
170+
"\n",
171+
"After running the evaluation, you can view the results in the W&B Weave dashboard. The evaluation shows:\n",
172+
"\n",
173+
"1. **Parallel execution**: All examples are evaluated simultaneously for faster results\n",
174+
"2. **Correctness scores**: Each response is scored based on whether it contains the correct answer\n",
175+
"3. **Detailed metrics**: Including extracted answers and whether the expected value was found\n",
176+
"\n",
177+
"Visit your [W&B Weave dashboard](https://wandb.ai/home) to explore the evaluation results in detail.\n"
178+
]
179+
}
180+
],
181+
"metadata": {
182+
"language_info": {
183+
"name": "python"
184+
}
185+
},
186+
"nbformat": 4,
187+
"nbformat_minor": 2
188+
}

0 commit comments

Comments
 (0)