GitHub - Sicarius07/Log-Analyzer: Log Analyzer

Backend Implementation

DataFrame Processing: Flattens NDJSON logs into pandas DataFrame with UUIDs
Vector Indexing: ChromaDB for semantic search
Structured Filtering: User query -> LLM analyzes columns -> extracts unique values -> creates hierarchical filters
Hybrid Search: Combines structured DataFrame filtering + semantic vector search
Final Analysis: GPT-4.1-mini analyzes filtered logs with structured output

Quick Start

Prerequisites

Python 3.8+
Node.js 16+
OpenAI API key

Backend Setup

cd backend
python -m venv venv
source venv/bin/activate  
pip install -r requirements.txt
cp env_example.txt .env
# Edit .env and add your OPENAI_API_KEY
python main.py

Backend runs on: http://localhost:8000

Frontend Setup

cd frontend
npm install
npm start

Frontend runs on: http://localhost:3000

Usage

Upload NDJSON log file (automatic indexing with progress bar)
Describe the incident/issue
Click "Analyze Logs (Fast)" for analysis
View filtered logs with expandable JSON and markdown analysis

Architecture

Backend: FastAPI + ChromaDB + OpenAI + Pandas
Frontend: React + Shadcn/ui + Tailwind CSS (dark theme)
LLM: GPT-4.1-mini

The primary challenge was filtering structured logs based on natural language user prompts. Subsequent steps—such as analyzing the filtered logs and identifying the most relevant entries—were comparatively straightforward and could be effectively handled through standard LLM calls.

For the filtering process, multiple approaches were explored. A multi-step, tool-call–based filtering mechanism was implemented using a pandas DataFrame. While this in-memory method performed well for smaller datasets, a more scalable solution would involve indexing and querying logs using Elasticsearch or a similar search system.

During experimentation, a high number of near-duplicate logs were observed, which adversely affected the accuracy of vector semantic search. Applying clustering before indexing into the vector database could significantly improve performance—although it would increase the initial indexing time, subsequent analysis and retrieval would become faster and more precise.

For cost estimation, the token cost for gpt-4.1-mini is hardcoded.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
sample_logs.ndjson		sample_logs.ndjson
test.json		test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Backend Implementation

Quick Start

Prerequisites

Backend Setup

Frontend Setup

Usage

Architecture

About

Uh oh!

Releases

Packages

Languages

Sicarius07/Log-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Backend Implementation

Quick Start

Prerequisites

Backend Setup

Frontend Setup

Usage

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages