A research project exploring how AI agents can leverage previous task context to improve future performance using Terminal-Bench and CAMEL-AI frameworks.
This project investigates the use of memory and context summarization in terminal-based AI agents. We implement and compare different agent architectures to understand how agents can learn from past experiences and apply that knowledge to new tasks.
- Context-Aware Agents: Agents that can load and utilize summaries from previous task executions
- Terminal-Bench Integration: Uses Terminal-Bench framework for standardized terminal task evaluation
- CAMEL-AI Summarization: Leverages CAMEL's
summarize()
functionality for intelligent context extraction - Multiple Agent Implementations: Compare different approaches (simple OpenAI, CAMEL-based)
mem-cli-agent/
├── agents/ # Agent implementations
│ ├── mini_agent.py # Simple OpenAI-based agent
│ └── camel_agent.py # CAMEL-powered agent with memory
├── test.sh # Test scripts for agents
├── pyproject.toml # Project dependencies
├── README.md # Project documentation
└── .gitignore # Git ignore rules
- Memory Utilization: How can agents effectively use previous task summaries to improve performance?
- Context Transfer: What information from past tasks is most valuable for future tasks?
- Learning Efficiency: Do memory-enabled agents show measurable improvement over stateless agents?
- Simple OpenAI GPT-4o-mini based agent
- Stateless execution
- Baseline for comparison
- CAMEL-AI powered agent
- Memory-enabled with context summarization
- Can load previous task summaries via
summary_path
parameter - Automatically generates summaries after task completion
# Test agents
./test.sh
from agents.camel_agent import CamelTerminus
# Agent without previous context
agent = CamelTerminus()
# Agent with previous context
agent = CamelTerminus(summary_path="path/to/previous/summary.md")
terminal-bench>=0.2.16
- Terminal task evaluation frameworkopenai>=1.0.0
- OpenAI API clientcamel-ai
- CAMEL-AI multi-agent framework (from GitHub)
# Clone the repository
git clone https://github.com/camel-ai/mem-cli-agent
cd mem-cli-agent
# Install dependencies
pip install -e .
# Initialize CAMEL submodule
git submodule update --init --recursive
cd camel/
make install-editable
- Baseline Evaluation: Test agents on standard terminal tasks without memory
- Memory Integration: Enable context loading and measure performance differences
- Context Analysis: Analyze what types of summaries are most effective
- Comparative Study: Compare memory-enabled vs stateless agent performance
This is a research project exploring agent memory and context utilization. Contributions, ideas, and discussions about agent memory architectures are welcome.
This project is for research purposes. See individual component licenses (Terminal-Bench, CAMEL-AI) for their respective terms.
- Terminal-Bench - Terminal task evaluation framework
- CAMEL-AI - Multi-agent conversation framework