Ellora (Enhancing LLMs with LoRA) is a collection of standardized, high-quality LoRA recipes for enhancing Large Language Model capabilities. Instead of building new frameworks, we focus on creating reproducible training methodologies that work with existing infrastructure.
The LLM ecosystem has amazing infrastructure (LoRAX, PEFT, vLLM), but lacks standardized, high-quality capability adapters. Ellora bridges this gap by providing:
- π Recipes, not frameworks - Reproducible training methodologies
- π― Quality-first approach - Rigorous evaluation and benchmarking
- π Self-supervised data generation - No dependency on external datasets
- ποΈ Infrastructure agnostic - Works with existing tools (PEFT, LoRAX, etc.)
- π Community-driven - Open recipes for the ecosystem
Recipe | Purpose | Key Achievement | Jump to |
---|---|---|---|
#1: Accuracy Recovery | Restore quantized model performance | <5% degradation from FP16 | Details |
#2: Reasoning Enhancement | Add structured thinking with <think> tags |
60% thinking usage, 75% quality boost | Details |
#3: Tool Calling | Enable effective development tool usage | 80% success rate on complex tasks | Details |
#4: Context Extension | Expand from 32K to 2M tokens | 61x context increase for full repos | Details |
#5: Secure Code Generation | Train models to write secure code by default | 97% vulnerability reduction | Details |
#6: Execution World Model | Add execution awareness to thinking models | 33% state prediction accuracy | Details |
Problem: Quantized models (INT4/INT8) lose accuracy compared to FP16 versions
Solution: Self-distillation LoRA adapter using Magpie-generated data
- π― Goal: <5% performance degradation from FP16 baseline
- πΎ Memory: ~75% reduction in model size
- β‘ Speed: 2-3x faster inference than FP16
- π Method: Teacher (FP16) β Student (INT4+LoRA) distillation
Key Innovation: Uses Magpie self-data generation for perfect domain alignment - no external datasets needed!
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-0.6B",
quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")
)
# Load accuracy recovery adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen3-0.6B-accuracy-recovery-lora")
# Use normally - now with recovered accuracy!
Model | Perplexity | Memory | Speed | Status |
---|---|---|---|---|
FP16 Baseline | 1.97 | 1.0GB | 1.0x | β |
INT4 Raw | 2.40 (+21.8%) | 0.25GB | 3.2x | |
INT4 + Ellora | 2.09 (+5.7%) | 0.28GB | 3.0x | β |
Problem: LLMs often lack structured thinking patterns for complex reasoning
Solution: GRPO-trained adapter that teaches chain-of-thought with <think></think>
tags
- π§ Goal: Enhance reasoning capabilities through preference learning
- π Method: GRPO (Group Relative Policy Optimization) with self-rewarding
- π― Feature: Teaches structured thinking with clear reasoning steps
- π‘ Output: Models that show their reasoning process transparently
Key Innovation: Self-generated preference data with automated quality scoring - no need for human annotations or external preference datasets!
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
# Load reasoning adapter
model = PeftModel.from_pretrained(model, "codelion/gemma-3-1b-it-reasoning-grpo-lora")
# Use with thinking prompt
prompt = '''Think step by step and use <think></think> tags to show your reasoning process.
Problem: If a train travels 120 miles in 2 hours, then increases its speed by 30 mph for the next hour, how many total miles does it travel?
Response:'''
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Model | Thinking Usage | Quality Score | Training Method | Status |
---|---|---|---|---|
Gemma-3-1B Base | 0% | 3.2 | - | |
Gemma-3-1B + Ellora | 60% | 5.6 | GRPO | β |
Problem: LLMs struggle with effective tool usage for code exploration
Solution: Hybrid training with Magpie scenarios + real tool execution results
- π οΈ Goal: Teach models to use development tools effectively
- π Method: Generate scenarios with Magpie, execute on real codebases
- π― Feature: OpenAI-compatible function calling format
- π» Tools: File operations, search, code navigation, and more
Key Innovation: Combines synthetic scenario diversity with real execution feedback - ensuring models learn authentic tool usage patterns!
Problem: Base models limited to 32K context, need 2M tokens for large repositories
Solution: Progressive curriculum learning with vLLM + Unsloth hybrid approach
- π Goal: Extend context from 32K to 2M tokens (61x increase)
- π Method: Curriculum learning across 4 stages (32K β 128K β 512K β 2M)
- β‘ Innovation: vLLM for fast data generation, Unsloth for memory-efficient training
- π Feature: Single LoRA adapter progressively learns longer contexts
Key Innovation: Hybrid optimization combining vLLM's inference speed with Unsloth's training efficiency - achieving 61x context extension with minimal compute!
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
# Load progressive context adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")
# Use with 2M token context - perfect for large repositories!
long_context_prompt = "Analyze this entire repository..." # Up to 2M tokens
inputs = tokenizer(long_context_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)
Model | Context Limit | Max Files | Use Case | Status |
---|---|---|---|---|
Qwen2.5-Coder Base | 32K tokens | ~10-20 files | Small projects | |
+ Stage 0 LoRA | 32K tokens | ~10-20 files | Single module analysis | β |
+ Stage 1 LoRA | 128K tokens | ~50-100 files | Medium repositories | β |
+ Stage 2 LoRA | 512K tokens | ~200-500 files | Large codebases | β |
+ Stage 3 LoRA | 2M tokens | ~1000+ files | Entire repositories | β |
Problem: LLMs frequently generate code with security vulnerabilities (SQL injection, etc.)
Solution: GRPO training with automated Semgrep analysis for security scoring
- π Goal: Generate secure code by default without explicit prompting
- π‘οΈ Method: Self-supervised training with automatic vulnerability detection
- π Scoring: Partial credit system (40% functionality, 40% patterns, 20% vulnerabilities)
- β Results: 97% reduction in vulnerabilities, 100% functional code
Key Innovation: Automated security analysis replaces manual curation - teaching secure patterns without labeled datasets!
from transformers import AutoModelForCausalLM
from peft import PeftModel
# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
# Load security adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2.5-coder-security-grpo-lora")
# Generate secure code by default
prompt = "Create a function to search for products by name in a database"
# Model will automatically use parameterized queries!
Metric | Base Model | + Security LoRA | Improvement |
---|---|---|---|
Vulnerability Score | 12.3 | 0.40 | -97% |
Functional Code | 95% | 100% | +5% |
Partial Credit Score | - | 61.2/100 | - |
Uses Secure Patterns | 5% | 76% | +1420% |
Problem: LLMs can generate and reason about code, but lack execution awareness - understanding how code behaves at runtime, predicting variable states, and comprehending dynamic program behavior Solution: GRPO-trained adapter combining Qwen3's native thinking with real execution traces, inspired by Meta's CWM (Code World Model)
- π§ Goal: Add execution awareness to thinking models
- π Method: Hybrid Magpie-style generation + real Python execution tracing
- π Feature: Predict program states, debug with execution understanding
- π‘ Model: Built on Qwen3-4B-Thinking-2507 with 262K context
Key Innovation: Combines Qwen3's thinking capabilities with real execution traces captured via Python's trace module - creating a "neural debugger" that understands both logic AND runtime behavior!
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Thinking-2507")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Thinking-2507")
# Load execution world model adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen3-4B-execution-world-model-lora")
# Analyze code with execution awareness
code = """
x = 10
y = x * 2
z = x + y
"""
prompt = f"Analyze this code and predict its execution trace step by step:\n\n```python\n{code}\n```"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Model will predict variable states at each line!
Metric | Value | Training Data | Status |
---|---|---|---|
Overall Accuracy | 20.0% | 298 samples | π§ |
Mean State Accuracy | 33.3% | Self-generated | π§ |
Base Model | Qwen3-4B-Thinking | 262K context | β |
Training Method | GRPO | Execution traces | β |
All models trained using Ellora recipes are available on HuggingFace:
codelion/Qwen3-0.6B-accuracy-recovery-lora
- Accuracy recovery for Qwen3-0.6Bcodelion/gemma-3-1b-it-reasoning-grpo-lora
- Reasoning enhancement for Gemma-3-1Bcodelion/Llama-3.2-1B-Instruct-tool-calling-lora
- Tool calling for Llama-3.2-1Bcodelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
- 2M context extension for Qwen2.5-Coder-0.5Bcodelion/Qwen2.5-Coder-0.5B-Instruct-security-grpo-lora
- Secure code generation for Qwen2.5-Coder-0.5Bcodelion/Qwen3-4B-execution-world-model-lora
- Execution-aware world model for Qwen3-4B-Thinking- More models coming as we test recipes across different model families!
If you use Ellora recipes in your research, please cite:
@misc{ellora2024,
title={Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement},
author={Asankhaya Sharma},
year={2024},
url={https://github.com/codelion/ellora}
}