Study Utils is a collection of scripts that support the full study loop for online courses and self-directed learning.
It captures some of the key ideas of a Personal Knowledge & Learning System (PKLS) that I've written about on my essays site: https://essays.mattlebrun.com/2025/09/personal-knowledge-learning-system.html
CLI first tooling for:
- Converting various document formats to Markdown
- Reshaping documents into study-friendly formats
- Create Retrieval-Augmented Generation (RAG) databases for study and topic exploration
- Create and take quizzes to reinforce learning
- Python 3.12+
OPENAI_API_KEYenvironment variable for AI-powered features (supports.env)
Optional but recommended:
- System libraries for WeasyPrint (for Markdown → PDF conversion)
ffmpegfor video transcription (pydub dependency)pandocfor epub → markdown conversion (if you want epub support)uvfor dependency and virtual environment management- A markdown document viewer/editor of your choice
This currently tested on Ubuntu 22.04/24.04 (WSL) and Termux on Android.
Technically it should work on macOS, but I haven't tested it there.
Assuming you have the system requirements in place, you can get started in a few steps.
Create a study workspace:
mkdir -p ~/Study && cd ~/StudyCreate configuration files:
printf 'OPENAI_API_KEY=your_api_key_here\n' > .envInstall with pip:
uv pip install git+https://github.com/cr8ivecodesmith/study-utils.gitInitialize the workspace:
uv run study initConvert documents to Markdown:
Create a materials/ directory and drop your source documents there:
mkdir -p materialsuv run study convert-markdown materialsThe converted Markdown files will land in
~/.study-utils-data/converted by default.
Reshape documents to study formats:
Scaffold the document prompts configuration (skip if study init already
created it):
uv run study generate-document config initCreate a reading assignment and a keywords list from a converted Markdown file:
uv run study generate-document reading_assignment assignments/lesson-01.md ~/.study-utils-data/converteduv run study generate-document keywords assignments/lesson-01-keywords.md ~/.study-utils-data/convertedCreate a RAG database:
Initialize and create a RAG database from the converted materials:
uv run study rag config inituv run study rag ingest --name lesson-01 ~/.study-utils-data/convertedExplore the study materials:
uv run study rag chat --db lesson-01Take quizzes:
Initialize and edit quizzer.toml and set:
uv run study quizzer init lesson-01[quiz.lesson-01]
sources = ["/path/to/home/.study-utils-data/converted"]Generate topics and questions:
uv run study quizzer topics generate lesson-01 --use-aiuv run study quizzer questions generate lesson-01 --per-topic 5 --ensure-coverageStart a quiz session:
uv run study quizzer start lesson-01 --num 5uvis the preferred workflow for managing dependencies and virtual environments. Runuv sync --devto install everything needed for local development.- When working inside Termux,
pyenvremains the more reliable option for managing Python versions and virtual environments. Install Python 3.12 withpyenvand create a virtual environment before running the tooling.
- Run
uv run pytest(orjust test) to execute the full suite. Coverage is enforced at 100% viapytest-cov; any regression will fail the run. - The tests run entirely offline by default. Shared fixtures under
tests/fixtures/provide stubs for OpenAI, WeasyPrint, dotenv, and pydub to keep runs deterministic. - For local debugging you can generate a detailed report with
uv run pytest --cov-report=term-missing. To temporarily relax the coverage gate while debugging, drop--cov-fail-under=100frompyproject.tomland restore it before committing.
- Run
study initto bootstrap the shared workspace (defaults to~/.study-utils-data). The command createsconverted/,logs/, andconfig/directories and accepts--pathfor alternate locations. - Configuration files live under
<workspace>/config. Usestudy convert-markdown config initto scaffoldconvert_markdown.tomlwith documented defaults. Pass--workspaceto target a specific workspace or--path/--forcefor fully custom destinations. - All CLI entry points respect the
STUDY_UTILS_DATA_HOMEenvironment variable when resolving the workspace; if unset they fall back to the default directory created bystudy init.
To enable Markdown → PDF generation with WeasyPrint, install its system libraries and fonts.
- Base setup:
sudo apt-get update sudo apt-get install -y libcairo2 libpango-1.0-0 libgdk-pixbuf2.0-0 libffi-dev libxml2 libxslt1.1 shared-mime-info
- Recommended fonts:
sudo apt-get install -y fonts-dejavu fonts-liberation
- To enable video transcription, ensure ffmpeg is installed (pydub uses it under the hood):
sudo apt-get install -y ffmpeg
- Download all materials for a module.
- At the very least the transcript file for the video.
- If there's no transcript file available, use
study transcribe-video.
All tooling is routed through the study console script. Run study list to
see available commands or study help <command> for details and supported
flags.
study transcribe-video TARGET [options]
- Transcribe one
.mp4file or a directory of.mp4files using Whisper-1. - Supports optional recursion, list/preview mode, smart names with optional AI refinement, and composable filename prefixes.
Examples:
- Preview names and save editable cache:
uv run study transcribe-video ./videos --list --smart-names --use-ai
- Transcribe a directory recursively with smart names and counter prefix:
study transcribe-video ./videos -r --smart-names -p 'text:Lecture ' -p 'counter:NN'
- Transcribe a single file to a custom folder:
study transcribe-video ./videos/intro.mp4 -o ./transcripts
study markdown-to-pdf OUTPUT.pdf INPUTS... [options]
- Convert Markdown to a single PDF using WeasyPrint, with configurable paper size, margins, optional title page, and optional table of contents.
Examples:
study markdown-to-pdf out.pdf notes.md --toc --paper-size a4study markdown-to-pdf out.pdf docs/ --level-limit 2 --margin "1in 0.75in"study markdown-to-pdf out.pdf README.md --title-page --title "My Guide" --author "Me"study init [options]
- Provision the shared workspace directory and any missing subdirectories.
Examples:
study initstudy init --path /tmp/study-utilsstudy convert-markdown PATHS... [options]
- Convert PDFs, DOCX, HTML, TXT, and EPUB files into Markdown outputs with YAML front matter while preserving basenames.
- Default outputs land in
<workspace>/converted; use--output-diror the TOML config to redirect elsewhere. - Respects layered config precedence (CLI flags > environment variables >
convert_markdown.toml). - Use
study convert-markdown config initto scaffold the default template in the workspace config directory.
Examples:
study convert-markdown ./docs --extensions pdf docxstudy convert-markdown config init --workspace ~/.study-utils-datastudy text-combiner OUTPUT INPUTS... [options]
- Combine text files with optional section titles and ordering. See
--help.
study generate-document DOC_TYPE OUTPUT INPUTS... [options]
- Generate a Markdown document from reference files using prompts defined in a TOML config.
- Resolves
documents.tomlfrom the workspace config directory by default and falls back to the current directory. If no config is found, the CLI exits with guidance to runstudy generate-document config init. - Use
study generate-document config initto scaffold the default template in the workspace or a custom destination (--path,--workspace,--force).
Example:
study generate-document reading_assignment notes.md ./materials --extensions txt md --level-limit 0study quizzer [options]
- Launch the interactive Rich-based quiz session for drilling on generated questions.
study rag <subcommand> [options]
- Manage retrieval-augmented study databases and chat sessions.
Includes
config,ingest,list,inspect,export,import,chat, anddoctorhelpers.
Examples:
study rag config initstudy rag ingest --name physics-notes ./notesstudy rag doctor