Skip to content

codelion/ellora

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 Ellora: Enhancing LLMs with LoRA

GitHub Models

Ellora (Enhancing LLMs with LoRA) is a collection of standardized, high-quality LoRA recipes for enhancing Large Language Model capabilities. Instead of building new frameworks, we focus on creating reproducible training methodologies that work with existing infrastructure.

🌟 Philosophy

The LLM ecosystem has amazing infrastructure (LoRAX, PEFT, vLLM), but lacks standardized, high-quality capability adapters. Ellora bridges this gap by providing:

  • πŸ“‹ Recipes, not frameworks - Reproducible training methodologies
  • 🎯 Quality-first approach - Rigorous evaluation and benchmarking
  • πŸ”„ Self-supervised data generation - No dependency on external datasets
  • πŸ—οΈ Infrastructure agnostic - Works with existing tools (PEFT, LoRAX, etc.)
  • 🌍 Community-driven - Open recipes for the ecosystem

πŸ“š Recipe Collection

Recipe Purpose Key Achievement Jump to
#1: Accuracy Recovery Restore quantized model performance <5% degradation from FP16 Details
#2: Reasoning Enhancement Add structured thinking with <think> tags 60% thinking usage, 75% quality boost Details
#3: Tool Calling Enable effective development tool usage 80% success rate on complex tasks Details
#4: Context Extension Expand from 32K to 2M tokens 61x context increase for full repos Details
#5: Secure Code Generation Train models to write secure code by default 97% vulnerability reduction Details
#6: Execution World Model Add execution awareness to thinking models 33% state prediction accuracy Details

🍳 Available Recipes

Recipe #1: Accuracy Recovery LoRA

Problem: Quantized models (INT4/INT8) lose accuracy compared to FP16 versions
Solution: Self-distillation LoRA adapter using Magpie-generated data

  • 🎯 Goal: <5% performance degradation from FP16 baseline
  • πŸ’Ύ Memory: ~75% reduction in model size
  • ⚑ Speed: 2-3x faster inference than FP16
  • πŸ“Š Method: Teacher (FP16) β†’ Student (INT4+LoRA) distillation

Open In Colab

Key Innovation: Uses Magpie self-data generation for perfect domain alignment - no external datasets needed!

Quick Start

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
)

# Load accuracy recovery adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen3-0.6B-accuracy-recovery-lora")

# Use normally - now with recovered accuracy!

Results

Model Perplexity Memory Speed Status
FP16 Baseline 1.97 1.0GB 1.0x βœ…
INT4 Raw 2.40 (+21.8%) 0.25GB 3.2x ⚠️
INT4 + Ellora 2.09 (+5.7%) 0.28GB 3.0x βœ…

Recipe #2: Reasoning LoRA with GRPO

Problem: LLMs often lack structured thinking patterns for complex reasoning
Solution: GRPO-trained adapter that teaches chain-of-thought with <think></think> tags

  • 🧠 Goal: Enhance reasoning capabilities through preference learning
  • πŸ“ Method: GRPO (Group Relative Policy Optimization) with self-rewarding
  • 🎯 Feature: Teaches structured thinking with clear reasoning steps
  • πŸ’‘ Output: Models that show their reasoning process transparently

Open In Colab

Key Innovation: Self-generated preference data with automated quality scoring - no need for human annotations or external preference datasets!

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")

# Load reasoning adapter
model = PeftModel.from_pretrained(model, "codelion/gemma-3-1b-it-reasoning-grpo-lora")

# Use with thinking prompt
prompt = '''Think step by step and use <think></think> tags to show your reasoning process.

Problem: If a train travels 120 miles in 2 hours, then increases its speed by 30 mph for the next hour, how many total miles does it travel?

Response:'''

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Results

Model Thinking Usage Quality Score Training Method Status
Gemma-3-1B Base 0% 3.2 - ⚠️
Gemma-3-1B + Ellora 60% 5.6 GRPO βœ…

Recipe #3: Tool Calling LoRA

Problem: LLMs struggle with effective tool usage for code exploration
Solution: Hybrid training with Magpie scenarios + real tool execution results

  • πŸ› οΈ Goal: Teach models to use development tools effectively
  • πŸ”„ Method: Generate scenarios with Magpie, execute on real codebases
  • 🎯 Feature: OpenAI-compatible function calling format
  • πŸ’» Tools: File operations, search, code navigation, and more

Open In Colab

Key Innovation: Combines synthetic scenario diversity with real execution feedback - ensuring models learn authentic tool usage patterns!

Recipe #4: Progressive Context Extension LoRA

Problem: Base models limited to 32K context, need 2M tokens for large repositories
Solution: Progressive curriculum learning with vLLM + Unsloth hybrid approach

  • πŸ“ˆ Goal: Extend context from 32K to 2M tokens (61x increase)
  • πŸŽ“ Method: Curriculum learning across 4 stages (32K β†’ 128K β†’ 512K β†’ 2M)
  • ⚑ Innovation: vLLM for fast data generation, Unsloth for memory-efficient training
  • πŸ” Feature: Single LoRA adapter progressively learns longer contexts

Open In Colab

Key Innovation: Hybrid optimization combining vLLM's inference speed with Unsloth's training efficiency - achieving 61x context extension with minimal compute!

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

# Load progressive context adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")

# Use with 2M token context - perfect for large repositories!
long_context_prompt = "Analyze this entire repository..." # Up to 2M tokens
inputs = tokenizer(long_context_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)

Results

Model Context Limit Max Files Use Case Status
Qwen2.5-Coder Base 32K tokens ~10-20 files Small projects ⚠️
+ Stage 0 LoRA 32K tokens ~10-20 files Single module analysis βœ…
+ Stage 1 LoRA 128K tokens ~50-100 files Medium repositories βœ…
+ Stage 2 LoRA 512K tokens ~200-500 files Large codebases βœ…
+ Stage 3 LoRA 2M tokens ~1000+ files Entire repositories βœ…

Recipe #5: Secure Code Generation LoRA

Problem: LLMs frequently generate code with security vulnerabilities (SQL injection, etc.)
Solution: GRPO training with automated Semgrep analysis for security scoring

  • πŸ”’ Goal: Generate secure code by default without explicit prompting
  • πŸ›‘οΈ Method: Self-supervised training with automatic vulnerability detection
  • πŸ“Š Scoring: Partial credit system (40% functionality, 40% patterns, 20% vulnerabilities)
  • βœ… Results: 97% reduction in vulnerabilities, 100% functional code

Open In Colab

Key Innovation: Automated security analysis replaces manual curation - teaching secure patterns without labeled datasets!

Quick Start

from transformers import AutoModelForCausalLM
from peft import PeftModel

# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

# Load security adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2.5-coder-security-grpo-lora")

# Generate secure code by default
prompt = "Create a function to search for products by name in a database"
# Model will automatically use parameterized queries!

Results

Metric Base Model + Security LoRA Improvement
Vulnerability Score 12.3 0.40 -97%
Functional Code 95% 100% +5%
Partial Credit Score - 61.2/100 -
Uses Secure Patterns 5% 76% +1420%

Recipe #6: Execution World Model Thinking LoRA

Problem: LLMs can generate and reason about code, but lack execution awareness - understanding how code behaves at runtime, predicting variable states, and comprehending dynamic program behavior Solution: GRPO-trained adapter combining Qwen3's native thinking with real execution traces, inspired by Meta's CWM (Code World Model)

  • 🧠 Goal: Add execution awareness to thinking models
  • πŸ” Method: Hybrid Magpie-style generation + real Python execution tracing
  • πŸ“Š Feature: Predict program states, debug with execution understanding
  • πŸ’‘ Model: Built on Qwen3-4B-Thinking-2507 with 262K context

Open In Colab

Key Innovation: Combines Qwen3's thinking capabilities with real execution traces captured via Python's trace module - creating a "neural debugger" that understands both logic AND runtime behavior!

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Thinking-2507")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Thinking-2507")

# Load execution world model adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen3-4B-execution-world-model-lora")

# Analyze code with execution awareness
code = """
x = 10
y = x * 2
z = x + y
"""

prompt = f"Analyze this code and predict its execution trace step by step:\n\n```python\n{code}\n```"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Model will predict variable states at each line!

Results

Metric Value Training Data Status
Overall Accuracy 20.0% 298 samples 🚧
Mean State Accuracy 33.3% Self-generated 🚧
Base Model Qwen3-4B-Thinking 262K context βœ…
Training Method GRPO Execution traces βœ…

πŸ† Model Zoo

All models trained using Ellora recipes are available on HuggingFace:

Models

Featured Models

πŸ”¬ Research & Citations

If you use Ellora recipes in your research, please cite:

@misc{ellora2024,
  title={Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement},
  author={Asankhaya Sharma},
  year={2024},
  url={https://github.com/codelion/ellora}
}

Key Papers & Inspirations