Discussion Retrieval Augmented Generation (dRAG)

dRAG is a local, privacy-focused tool designed to help educators answer discussion questions efficiently using their course materials, such as syllabi and lecture notes. By leveraging Retrieval-Augmented Generation (RAG) techniques, dRAG ensures accurate and context-aware responses, all while keeping sensitive data secure by running offline. The project uses UV for dependency management and Ollama for both LLM and embedding models.

Features

PDF Indexing: Converts and indexes course materials for efficient retrieval.
Customizable Prompt: Allows users to specify LLM prompt for tailored results.
Ollama Integration: Uses qwen2.5:latest for generation and snowflake-arctic-embed2 for embeddings.
Offline Privacy: Ensures data security by running entirely offline.

Requirements

Python 3.12 or higher
UV
Ollama installed and running

Installation

Install UV

On macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

On Windows:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Clone the Repository

git clone https://github.com/Teaching-and-Learning-in-Computing/dRAG
cd dRAG

Set Up the Environment

UV will handle dependencies defined in the pyproject.toml:

uv sync

Usage

Step 1: Place the PDF File to Index

Place the PDF file you want to index in the /documents/source_documents/ folder. For example:

documents/source_documents/your_file_name.pdf

Step 2: Download and Start Ollama Models

Ensure Ollama is installed and running. Pull the required models by running the following commands:

ollama pull qwen2.5:latest
ollama pull snowflake-arctic-embed2

Start the Ollama server to enable model usage.

Step 3: Prepare Input Questions

Update the documents/input/input.json file with the questions and answers you want to use. The JSON file should follow this structure:

[
    {
        "questions": "What are the course prerequisites?",
        "answer": "Prerequisites include introductory programming and basic statistics."
    }
]

The answer field should contain a "ground truth" response provided by a professor or TA for evaluation purposes.

Step 4: Run the Project

Run the main.py script to process documents, generate answers, and evaluate results:

python main.py

The script executes the following steps in order:

Indexes the specified PDF files using index.py.
Processes input questions and generates answers using generate.py.
Evaluates generated answers against the provided ground truth using evaluate.py.

Notes for Subsequent Runs

Adding Documents: If you add new documents, place them in the /documents/source_documents/ folder and rerun main.py.
Modifying Inputs: To change questions or prompts, update the input.json file and rerun main.py.
Model Updates: If you want to switch models, ensure they are pulled into Ollama and update each file.

Example Workflow

Place a syllabus PDF in documents/source_documents/.
Add your questions and ground truth answers to input.json.
Run main.py to index, generate, and evaluate results.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
documents		documents
utils		utils
visual_design		visual_design
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
evaluate.py		evaluate.py
generate.py		generate.py
index.py		index.py
main.py		main.py
prompts.py		prompts.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Discussion Retrieval Augmented Generation (dRAG)

Features

Requirements

Installation

Install UV

Clone the Repository

Set Up the Environment

Usage

Step 1: Place the PDF File to Index

Step 2: Download and Start Ollama Models

Step 3: Prepare Input Questions

Step 4: Run the Project

Notes for Subsequent Runs

Example Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Teaching-and-Learning-in-Computing/dRAG

Folders and files

Latest commit

History

Repository files navigation

Discussion Retrieval Augmented Generation (dRAG)

Features

Requirements

Installation

Install UV

Clone the Repository

Set Up the Environment

Usage

Step 1: Place the PDF File to Index

Step 2: Download and Start Ollama Models

Step 3: Prepare Input Questions

Step 4: Run the Project

Notes for Subsequent Runs

Example Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages