-
Couldn't load subscription status.
- Fork 31
Description
To be able to release the corpus API, we need a way for users to CRUD the corpora on a a given source storage. To make our lives a little easier, we are not targeting a UI for this yet, but should start with a CLI instead. Since the likely consumers of this feature are power users or admins, this is ok.
We can add a ragna corpus subcommand to the CLI. This in turn could have more subcommands:
ragna corpus list: List all available corporaragna corpus ingest: Ingest some documents into a given corpus (more on this later)ragna corpus delete: Delete a given corpusragna corpus metadata: List all available metadata in a given corpus
Each command needs the source storage the action should be applied to. We have a few options here that we potentially can implement all:
- Add a
--source-storageflag that accepts an import string similar to what we do in our config file, e.g.--source-storage ragna.source_storages.Chroma - Allow passing a
--configand only accept a--source-storagelisted there. Also allow passing the source storage by its display name similar to the API, since we know the options. - If we have a
--configparameter and--source-storageis not passed, offer the user an interactive list of available source storages to select from
ragna corpus ingest is the trickiest of them. IMO a reasonable default behavior would be
- We recurse through a positionally supplied root directory
- We call
LocalDocument.from_pathon each path for which we have an availableDocumentHandler
From there on it is just calling SourceStorage.store and injecting them into Ragna's database.
The tricky part comes when opening this up to other behavior than just the default:
- Can we assume that by calling
ragna corpus ingestyou want to ingest files from local disk? - Should we enforce that every
Documentclass has afrom_pathclassmethod in order for us to create an arbitraryDocumentsubclass if we have nothing more than a path?
I would love to hear from @nenb @blakerosenthal @dillonroach how this is done in the existing deployment.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status