Skip to content

Conversation

@simonpcouch
Copy link
Collaborator

Closes #52.

Opening as draft just to communicate that I'd want to test more thoroughly and make sure I understand the security implications here, but I was impressed with how easy this was to make happen, how powerful it feels already, and how effectively sandboxed it feels from the R session. Do feel free to leave comments before I mark it as ready for review. :)

Copy link
Collaborator

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'm glad you're going down this road! Querying data frames via duckdb is also what I had in mind (I have some code in a local branch somewhere that I can dig up if needed).

Comment on lines +6 to +12
btw_tool_env_query_data_frame <- function(query, data_frame) {
d <- get(data_frame)
conn <- btw_connection()

if (!DBI::dbExistsTable(conn, data_frame)) {
duckdb::duckdb_register(conn, data_frame, d, experimental = FALSE)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a registration system for these tables? The usage would be something like

btw_register_table(mtcars)
btw_register_table(mtcars, "my_mtcars")

client <- btw_client()

If done before the client is created, the tool description could include the table names. Or we could also have a "list tables" tool to introduce the available tables to the LLM.

The other advantage is that the LLM wouldn't need to provide data_frame, it would only need to write the SQL code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There'd be friction in needing to register tables for use with btw, but I also like the explicit consent nature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and the registration function would be a generic, so we could also take existing connections, remote tables, .csv files etc. (eventually, but I'd stick with starting with R data frames via duckdb).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you imagine that the registration is optional or required? I do like the "if it's in my global environment, the model can find it" paradigm that's set with the existing environment and data frame description tools, which makes me lean towards making registration optional. There's also already some existing models of explicit consent happening depending on the UI; i.e. if btw is hooked up to Claude Desktop or Claude Code via acquaint, the first usages of the tools will ask for the user's permission anyway.

I do see the value in the registration system in that get() doesn't have a lot of uses beyond the global environment and models are otherwise unable to "fetch" data up to this point. The registration system sounds really helpful for those other types of data sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

a tool to explore/manipulate data frames

2 participants