Skip to content

Commit 8f5055a

Browse files
committed
Add example for using list of string + assets with Chat class
1 parent 1b5337e commit 8f5055a

File tree

1 file changed

+52
-0
lines changed

1 file changed

+52
-0
lines changed

docs/features/models/transformers_multimodal.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,58 @@ response = model(prompt, max_new_tokens=50)
132132
print(response) # 'A Siamese cat with blue eyes is sitting on a cat tree, looking alert and curious.'
133133
```
134134

135+
Or using a list containing text and assets:
136+
137+
```python
138+
import outlines
139+
from outlines.inputs import Chat, Image
140+
from transformers import AutoModelForImageTextToText, AutoProcessor
141+
from PIL import Image as PILImage
142+
from io import BytesIO
143+
import requests
144+
import torch
145+
146+
147+
TEST_MODEL = "Qwen/Qwen2.5-VL-7B-Instruct"
148+
149+
# Function to get an image
150+
def get_image(url):
151+
headers = {
152+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
153+
}
154+
r = requests.get(url, headers=headers)
155+
image = PILImage.open(BytesIO(r.content)).convert("RGB")
156+
image.format = "PNG"
157+
return image
158+
159+
model_kwargs = {
160+
"torch_dtype": torch.bfloat16,
161+
# "attn_implementation": "flash_attention_2",
162+
"device_map": "auto",
163+
}
164+
165+
# Create a model
166+
model = outlines.from_transformers(
167+
AutoModelForImageTextToText.from_pretrained(TEST_MODEL, **model_kwargs),
168+
AutoProcessor.from_pretrained(TEST_MODEL, **model_kwargs),
169+
)
170+
171+
# Create the chat input
172+
prompt = Chat([
173+
{"role": "user", "content": "You are a helpful assistant that helps me described pictures."},
174+
{"role": "assistant", "content": "I'd be happy to help you describe pictures! Please go ahead and share an image"},
175+
{
176+
"role": "user",
177+
"content": ["Describe briefly the image", Image(get_image("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg"))]
178+
},
179+
])
180+
181+
# Call the model to generate a response
182+
response = model(prompt, max_new_tokens=50)
183+
print(response) # 'The image shows a light-colored cat with a white chest...'
184+
```
185+
186+
135187
### Batching
136188
The `TransformersMultiModal` model supports batching through the `batch` method. To use it, provide a list of prompts (using the formats described above) to the `batch` method. You will receive as a result a list of completions.
137189

0 commit comments

Comments
 (0)