dRAG is a local, privacy-focused tool designed to help educators answer discussion questions efficiently using their course materials, such as syllabi and lecture notes. By leveraging Retrieval-Augmented Generation (RAG) techniques, dRAG ensures accurate and context-aware responses, all while keeping sensitive data secure by running offline. The project uses UV for dependency management and Ollama for both LLM and embedding models.
- PDF Indexing: Converts and indexes course materials for efficient retrieval.
- Customizable Prompt: Allows users to specify LLM prompt for tailored results.
- Ollama Integration: Uses
qwen2.5:latestfor generation andsnowflake-arctic-embed2for embeddings. - Offline Privacy: Ensures data security by running entirely offline.
On macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | shOn Windows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"git clone https://github.com/Teaching-and-Learning-in-Computing/dRAG
cd dRAGUV will handle dependencies defined in the pyproject.toml:
uv syncPlace the PDF file you want to index in the /documents/source_documents/ folder. For example:
documents/source_documents/your_file_name.pdf
Ensure Ollama is installed and running. Pull the required models by running the following commands:
ollama pull qwen2.5:latest
ollama pull snowflake-arctic-embed2Start the Ollama server to enable model usage.
Update the documents/input/input.json file with the questions and answers you want to use. The JSON file should follow this structure:
[
{
"questions": "What are the course prerequisites?",
"answer": "Prerequisites include introductory programming and basic statistics."
}
]The answer field should contain a "ground truth" response provided by a professor or TA for evaluation purposes.
Run the main.py script to process documents, generate answers, and evaluate results:
python main.pyThe script executes the following steps in order:
- Indexes the specified PDF files using
index.py. - Processes input questions and generates answers using
generate.py. - Evaluates generated answers against the provided ground truth using
evaluate.py.
- Adding Documents: If you add new documents, place them in the
/documents/source_documents/folder and rerunmain.py. - Modifying Inputs: To change questions or prompts, update the
input.jsonfile and rerunmain.py. - Model Updates: If you want to switch models, ensure they are pulled into Ollama and update each file.
- Place a syllabus PDF in
documents/source_documents/. - Add your questions and ground truth answers to
input.json. - Run
main.pyto index, generate, and evaluate results.