Skip to content

A no-code toolkit to finetune LLMs on your local GPU—just upload data, pick a task, and deploy later. Perfect for hackathons or prototyping, with automatic hardware detection and a guided React interface.

License

Notifications You must be signed in to change notification settings

RETR0-OS/ModelForge

Repository files navigation

ModelForge 🔧⚡

Finetune LLMs on your laptop’s GPU—no code, no PhD, no hassle.

logo

🚀 Features

  • GPU-Powered Finetuning: Optimized for NVIDIA GPUs (even 4GB VRAM).
  • One-Click Workflow: Upload data → Pick task → Train → Test.
  • Hardware-Aware: Auto-detects your GPU/CPU and recommends models.
  • React UI: No CLI or notebooks—just a friendly interface.

📖 Supported Tasks

  • Text-Generation: Generates answers in the form of text based on prior and fine-tuned knowledge. Ideal for use cases like customer support chatbots, story generators, social media script writers, code generators, and general-purpose chatbots.
  • Summarization: Generates summaries for long articles and texts. Ideal for use cases like news article summarization, law document summarization, and medical article summarization.
  • Extractive Question Answering: Finds the answers relevant to a query from a given context. Best for use cases like Retrieval Augmented Generation (RAG), and enterprise document search (for example, searching for information in internal documentation).

Installation

Prerequisites

  • Python==3.11.x: Ensure you have Python installed.
  • NVIDIA GPU: Recommended VRAM >= 6GB.
  • CUDA: Ensure CUDA is installed and configured for your GPU.
  • HuggingFace Account: Create an account on Hugging Face and generate a finegrained access token.

Steps

  1. Install the Package:

    pip install modelforge-finetuning
  2. Set HuggingFace API Key in environment variables:
    Linux:

    export HUGGINGFACE_TOKEN=your_huggingface_token

    Windows Powershell:

    $env:HUGGINGFACE_TOKEN="your_huggingface_token"

    Windows CMD:

    set HUGGINGFACE_TOKEN=your_huggingface_token

    Or use a .env file:

    echo "HUGGINGFACE_TOKEN=your_huggingface_token" > .env
  3. Install Appropriate CUDA version for PyTorch:

    • Navigate to the PyTorch installation page and select the appropriate CUDA version for your system.
    • Install PyTorch with the correct CUDA version. For example, for CUDA 12.6 on Windows, you can use:
     pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
  4. Run the Application:

    modelforge run
  5. Done!: Navigate to http://localhost:8000 in your browser and get started!

Running the Application Again in the Future

  1. Start the Application:
    modelforge run
  2. Navigate to the App:
    Open your browser and go to http://localhost:8000.

Stopping the Application

To stop the application and free up resources, press Ctrl+C in the terminal running the app.

📂 Dataset Format

{"input": "Enter a really long article here...", "output": "Short summary."},
{"input": "Enter the poem topic here...", "output": "Roses are red..."}

🤝 Contributing Model Recommendations

ModelForge uses a modular configuration system for model recommendations. Contributors can easily add new recommended models by adding configuration files to the model_configs/ directory. Each hardware profile (low_end, mid_range, high_end) has its own configuration file where you can specify primary and alternative models for different tasks.

See the Model Configuration Guide for detailed instructions on how to add new model recommendations.

🛠 Tech Stack

  • transformers + peft (LoRA finetuning)
  • bitsandbytes (4-bit quantization)
  • React (UI)
  • FastAPI (Backend)
  • Python (Backend)
  • React.JS (Frontend)

About

A no-code toolkit to finetune LLMs on your local GPU—just upload data, pick a task, and deploy later. Perfect for hackathons or prototyping, with automatic hardware detection and a guided React interface.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •