Introspect is a service that does data-focused deep research for structured data. It understands your structured data (databases or CSV/Excel files), unstructured data (PDFs), and can query the web to get additional context.
- Set up environment variables:
# Create a .env file in your root folder
# You need all 3 - not just one
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
ANTHROPIC_API_KEY="YOUR_ANTHROPIC_API_KEY"
GEMINI_API_KEY="YOUR_GEMINI_API_KEY"- Start all services using Docker Compose:
docker compose up --build- Access the application in your browser:
- Main application: http://localhost:80
- Standalone Backend API: http://localhost:1235
We use a simple AI agent with tool use. An LLM attempts to answer a user question with 3 tools – text_to_sql, web_search, and pdf_with_citations.
The model then recursively asks questions using one of these tools until it is satisfied that it has enough context to answer the users question. By default, we use o3-mini for text to SQL, gemini-2.0-flash for web search, and claude-4-sonnet for both PDF analysis and orchestration.
For development workflows and more detailed instructions, see the README files in the /backend and /frontend directories.
Defog supports most database connectors including PostgreSQL, MySQL, SQLite, BigQuery, Redshift, Snowflake, and Databricks – and also includes support for CSV and Excel files.
- Run all tests:
docker exec introspect-backend pytest - Run single test:
docker exec introspect-backend pytest tests/test_file.py::test_function -v - Tests use the
agents-postgresservice for database operations - Create admin user:
docker exec introspect-backend python create_admin_user.py
- Development server:
cd frontend && npm run dev - Build production:
cd frontend && npm run build - Export static site:
cd frontend && npm run export - Run frontend tests:
cd frontend && npx playwright test - Lint (Prettier):
cd frontend && npm run lint
- It is highly recommended to run this only as a Docker image, for security purposes
This repo is maintained by Defog.ai
- Create Docs
- Let users choose what model they want for which task from the `.env. file
- Docs and examples for how to add custom tools
- Docs and examples for how to integrate with unstructured data sources with search, like Google Drive and OneDrive