Guidance on persisting messages #4845
-
Persisting messages as it stands feels full of pitfalls. The one's I'm encountering are:
|
Beta Was this translation helpful? Give feedback.
Replies: 14 comments 59 replies
-
Hey! Pulled together an example with Postgres and Drizzle that stores messages atomically rather than loading and saving full chats on each new response. https://github.com/nicoalbanese/ai-sdk-persistence-db Quick heads up: we are making major improvements to |
Beta Was this translation helpful? Give feedback.
-
Has there been a proper way to manage persistence using vercel-ai yet? Including tool-calls and type safety |
Beta Was this translation helpful? Give feedback.
-
the Message has toolInvocations and parts property,they are useful in [Generative User Interfaces] section(https://sdk.vercel.ai/docs/ai-sdk-ui/generative-user-interfaces) ,if it is necessary to save these info db for persisting? |
Beta Was this translation helpful? Give feedback.
-
This here is how I store it: Database StructureI use two main tables with a one-to-many relationship:
Instead of storing the entire history as a JSON blob, I store individual messages with their metadata. The complex parts (sources, tool invocations, etc.) can either be:
When retrieving messages, I use SQL JOINs to reconstruct the complete conversation and then pass it through a message parser function that converts the database records to the proper Message types expected by the chat interface. works super well and is (quite) easy to setup :P I made example repo here : https://github.com/ElectricCodeGuy/SupabaseAuthWithSSR Note: that the attachment is stored as a data string, but this might not be optimal if the data string is 20MB. A bucket would be a better option but parsing it back and forth between client and server becomes a bit tricky. |
Beta Was this translation helpful? Give feedback.
-
recently i participated on some hackathon where i wanted to build a simple "template" app with chat history well integrated using drizzle... i quite like the result: each message is stored as jsonb |
Beta Was this translation helpful? Give feedback.
-
Looking at the Vercel AI Chatbot that Matt linked, this is how I transform the import { appendResponseMessages, CoreAssistantMessage, CoreToolMessage, Message, streamText } from "ai";
type ResponseMessage = (CoreAssistantMessage | CoreToolMessage) & { id: string }
const responseMessagesToMessage = (responseMessages: ResponseMessage[]): Message =>
appendResponseMessages({
messages: [{ id: 'unused', role: 'user', content: 'unused' }],
responseMessages,
})[1]
streamText({
model: azureChatModel,
messages,
tools,
maxSteps: 10,
onFinish: async (result) => await insertMessage(sessionId, responseMessagesToMessage(result.response.messages))
}) |
Beta Was this translation helpful? Give feedback.
-
Hey folks - I've updated this template to store This is the recommended approach at the moment but very open to feedback and improvements here! |
Beta Was this translation helpful? Give feedback.
-
Persistence in the onFinish callback is an incomplete/insufficient approach for multi-step agents. There's an issue with more complex agents in real-world use-cases that have complex tools with side-effects. The issue is that sometimes a stream may error mid-way through. If the agent is 5 tool calls deep and has already made significant mutations to various entities in the system, your example doesn't persist the partial generation (the parts/final partial part leading up to the error). This sucks from a UX perspective because when the user goes to interact with the agent again, a) the message history is corrupted - even though the agent made mutations, the user cannot see in the chat history anymore - even though the entities have changed. and b) The agent doesn't remember which mutations it has already made since they were never persisted in the chat history... leading it to start repeating it's previous instructions from the user and duplicate all of it's efforts. Right now my workaround is having the client handle this partial errored state persistence by hitting a POST /chat/partial endpoint which sends the partial message that was streamed down to the user back up to the server for persistence (a very unnecessary round-trip with additional potential points of failure, in my view). Open to any thoughts on this! |
Beta Was this translation helpful? Give feedback.
-
That is very strong idea. Initially i liked vercel ai sdk because it was
just a simple interface for different llm providers. But with multiple
steps options it is turning into a mini framework which i wanted to avoid
in first place.
I might explore this as well.
…On Sat, 24 May 2025 at 03:59, Michael Tromba ***@***.***> wrote:
Follow-up, I have begun experimenting with a single-step architecture
where I set maxSteps: 1 and let the client control it's recursive LLM
calling logic.
So in the client-side onFinish, I check if finish reason was a tool call,
and if so, and my local steps counter has not exceed a client-defined
maximum, I append a new user message with id: CONTINUE which I:
a) exclude from my chat history rendering by detecting that id, and
b) detect server-side as a continuation request and handle it accordingly.
The id: CONTINUE messages also get excluded from my persistence logic
server-side.
The benefit of this is that I am able to persist in the onFinish callback
more atomically - on every step. It also gives me more granular control
over the context window I'm providing on each step instead of letting the
SDK just keep naively appending to it, bloating the tokens being sent as
input.
Still experimenting and not sure how practical this is yet but it looks
promising.
—
Reply to this email directly, view it on GitHub
<#4845 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADWTG22UPKKLFZXHJFDARVD277HARAVCNFSM6AAAAABW7KRS32VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGMRVGM2DANA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Related to some other commenters, my biggest challenge seems to be handling the sse stream being cancelled due to client disconnect. Ideally when the client calls the backend there's someway to separate the actual stream text call so that it stays running until it completes, with appropriate message persistence, and then if the client disconnects, only the layer above the stream text call ends up being cancelled. Anyone have suggestions on how this could be done? I'm not super keen on breaking the stream text call out into a separate architectural component outside of say a nextjs api route because then it feels like we are doing more work than necessary to stand up say a message queue or redis stream or something, and surely there is a better / simpler solution to this? |
Beta Was this translation helpful? Give feedback.
-
Is there any value in persisting |
Beta Was this translation helpful? Give feedback.
-
Would be amazing to have a guide on persisting messages in v5. Feels like it's still an open question that almost everyone runs into at some point. |
Beta Was this translation helpful? Give feedback.
-
Just an update here that this was a big focus of v5 and we will have an updated template with best practices, incorporating a lot of your feedback here! Importantly, we do not store things as |
Beta Was this translation helpful? Give feedback.
-
Thanks for bearing with me here! This template has now been updated to AI SDK 5 and now uses a much more robust and scalable persistence pattern. I have done my best to optimise performance where possible. Please do let me know feedback and how we can make this better. For a TLDR of the new pattern, please see the README. So what changed?Previously, we were storing Prefix-based part storageTo resolve this, we've moved to a prefix-based approach for storing message parts directly in the database schema. Instead of using a flexible but problematic JSONB column, we now have dedicated columns for each message part type with specific prefixes:
So where before our message schema looked like this: export const chats = pgTable("chats", {
id: varchar()
.primaryKey()
.$defaultFn(() => nanoid()),
});
export const roleEnum = pgEnum("role", ["user", "assistant", "system", "data"]);
export const messages = pgTable("messages", {
id: varchar()
.primaryKey()
.$defaultFn(() => nanoid()),
chatId: varchar()
.references(() => chats.id, { onDelete: "cascade" })
.notNull(),
createdAt: timestamp().defaultNow().notNull(),
parts: jsonb().$type<UIMessage["parts"]>().notNull(), // BAD
role: roleEnum().notNull(),
}); Now, it looks like this: export const chats = pgTable("chats", {
id: varchar()
.primaryKey()
.$defaultFn(() => generateId()),
});
export const messages = pgTable(
"messages",
{
id: varchar()
.primaryKey()
.$defaultFn(() => generateId()),
chatId: varchar()
.references(() => chats.id, { onDelete: "cascade" })
.notNull(),
createdAt: timestamp().defaultNow().notNull(),
role: varchar().$type<MyUIMessage["role"]>().notNull(),
},
(table) => [
index("messages_chat_id_idx").on(table.chatId),
index("messages_chat_id_created_at_idx").on(table.chatId, table.createdAt),
],
);
export const parts = pgTable(
"parts",
{
id: varchar()
.primaryKey()
.$defaultFn(() => generateId()),
messageId: varchar()
.references(() => messages.id, { onDelete: "cascade" })
.notNull(),
type: varchar().$type<MyUIMessage["parts"][0]["type"]>().notNull(),
createdAt: timestamp().defaultNow().notNull(),
order: integer().notNull().default(0),
// Text fields
text_text: text(),
// Reasoning fields
reasoning_text: text(),
// File fields
file_mediaType: varchar(),
file_filename: varchar(), // optional
file_url: varchar(),
// Source url fields
source_url_sourceId: varchar(),
source_url_url: varchar(),
source_url_title: varchar(), // optional
// Source document fields
source_document_sourceId: varchar(),
source_document_mediaType: varchar(),
source_document_title: varchar(),
source_document_filename: varchar(), // optional
// tools are stored in separate cols
tool_getWeatherInformation_toolCallId: varchar(),
tool_getWeatherInformation_state: varchar().$type<ToolUIPart["state"]>(),
tool_getWeatherInformation_input:
jsonb().$type<getWeatherInformationInput>(),
tool_getWeatherInformation_output:
jsonb().$type<getWeatherInformationOutput>(),
tool_getWeatherInformation_errorText: varchar(),
// Data parts
data_weather_id: varchar().$defaultFn(() => generateId()),
data_weather_location: varchar().$type<MyDataPart["weather"]["location"]>(),
data_weather_weather: varchar().$type<MyDataPart["weather"]["weather"]>(),
data_weather_temperature:
real().$type<MyDataPart["weather"]["temperature"]>(),
providerMetadata: jsonb().$type<MyProviderMetadata>(),
},
(t) => [
// Indexes for performance optimisation
index("parts_message_id_idx").on(t.messageId),
index("parts_message_id_order_idx").on(t.messageId, t.order),
// Other constraints
],
); This prefix-based column naming convention provides several key advantages including type safety with strongly-typed columns, better query performance through direct column access, database-level data integrity constraints, migration-friendly schema changes, and efficient indexing. Simplified message persistence workflowThe other big change that comes thanks to AI SDK 5 is where and how we are saving messages. Our suggestion has always been to persist messages in the However, saving const result = streamText({
model: openai("gpt-4o-mini"),
messages: convertToModelMessages(messages),
maxSteps: 5,
tools,
});
return result.toDataStreamReponse({
onFinish: async ({ response }) => {
const newMessage = appendResponseMessages({
messages,
responseMessages: response.messages,
}).at(-1)!;
await upsertMessage({
id: newMessage.id,
chatId: chatId,
message: newMessage as UIMessage,
});
},
}); That is why we've made changes to the const result = streamText({
model: openai("gpt-4o-mini"),
messages: convertToModelMessages(messages),
stopWhen: stepCountIs(5),
tools,
});
return result.toUIMessageStreamResponse({
originalMessages: messages, // pass in all previous messages
onFinish: async ({ responseMessage, messages }) => {
// save just most recent assistant message with responseMessage
await saveMessage({
chatId,
id: responseMessage.id,
message: responseMessage,
});
// or, save full message history with messages
await saveChat({
chatId,
messages,
});
},
}); There are obviously many more changes but these are the two central changes that really improve the overall process of persisting your AI SDK application state. What's missing?This template isn't perfect and is still being improved. Notably, it's missing persistence of partial state. We will be working on this soon but wanted to get this template out so folks could comment and improve where necessary. |
Beta Was this translation helpful? Give feedback.
Thanks for bearing with me here! This template has now been updated to AI SDK 5 and now uses a much more robust and scalable persistence pattern. I have done my best to optimise performance where possible. Please do let me know feedback and how we can make this better.
For a TLDR of the new pattern, please see the README.
So what changed?
Previously, we were storing
chats
andmessages
. This was simple, butparts
were stored as ajsonb()
column. This obviously presented data integrity and migration issues.Prefix-based part storage
To resolve this, we've moved to a prefix-based approach for storing message parts directly in the database schema. Instead of using a flexible but problematic …