-
Notifications
You must be signed in to change notification settings - Fork 3
add tool to run queries against data frames #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I'm glad you're going down this road! Querying data frames via duckdb is also what I had in mind (I have some code in a local branch somewhere that I can dig up if needed).
| btw_tool_env_query_data_frame <- function(query, data_frame) { | ||
| d <- get(data_frame) | ||
| conn <- btw_connection() | ||
|
|
||
| if (!DBI::dbExistsTable(conn, data_frame)) { | ||
| duckdb::duckdb_register(conn, data_frame, d, experimental = FALSE) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a registration system for these tables? The usage would be something like
btw_register_table(mtcars)
btw_register_table(mtcars, "my_mtcars")
client <- btw_client()If done before the client is created, the tool description could include the table names. Or we could also have a "list tables" tool to introduce the available tables to the LLM.
The other advantage is that the LLM wouldn't need to provide data_frame, it would only need to write the SQL code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There'd be friction in needing to register tables for use with btw, but I also like the explicit consent nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and the registration function would be a generic, so we could also take existing connections, remote tables, .csv files etc. (eventually, but I'd stick with starting with R data frames via duckdb).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you imagine that the registration is optional or required? I do like the "if it's in my global environment, the model can find it" paradigm that's set with the existing environment and data frame description tools, which makes me lean towards making registration optional. There's also already some existing models of explicit consent happening depending on the UI; i.e. if btw is hooked up to Claude Desktop or Claude Code via acquaint, the first usages of the tools will ask for the user's permission anyway.
I do see the value in the registration system in that get() doesn't have a lot of uses beyond the global environment and models are otherwise unable to "fetch" data up to this point. The registration system sounds really helpful for those other types of data sources.
Closes #52.
Opening as draft just to communicate that I'd want to test more thoroughly and make sure I understand the security implications here, but I was impressed with how easy this was to make happen, how powerful it feels already, and how effectively sandboxed it feels from the R session. Do feel free to leave comments before I mark it as ready for review. :)