💬 Twitter Sentiment Analysis App

A complete end-to-end NLP project that classifies tweets as Positive or Negative using the Sentiment140 Dataset.
This project combines Natural Language Processing, Machine Learning, and MLOps — from data cleaning to model deployment.

🚀 Features

✅ Text Preprocessing (cleaning, tokenization, lemmatization)
✅ TF-IDF Vectorization for feature extraction
✅ Multiple ML models with cross-validation & hyperparameter tuning
✅ Streamlit web app for real-time sentiment prediction
✅ Model saving & reusability with joblib
✅ Fully Dockerized for consistent deployment
✅ GitHub Actions CI Workflow for automated testing & build
✅ Kubernetes/Manifest ready for cloud deployment (optional)

🧩 Project Structure

├── .github
    └── workflows
    │   └── sentimentlsis.yml
├── Docker-compose.yml
├── Dockerfile
├── dashboard.py
├── manifest.yml
├── requirements.txt
└── src
    ├── preprep.ipynb
    ├── sentiment_model.pkl
    └── tfidf_vectorizer.pkl

🧠 Tech Stack

Category	Tools / Libraries
Language	Python
Data Handling	Pandas, NumPy
NLP	NLTK, Regex, Emoji
Feature Extraction	TF-IDF (sklearn)
Modeling	Logistic Regression, SVM, Random Forest
App Framework	Streamlit
Model Persistence	Joblib
Containerization	Docker
Automation	GitHub Actions
Deployment	Streamlit Cloud / Render / Kubernetes

🧹 Data Preprocessing

Lowercasing text
Removing URLs, mentions, hashtags, and punctuation
Tokenization using nltk
Stopword removal
Lemmatization (WordNetLemmatizer)
Emoji handling (emoji.demojize)

This ensures the model sees only meaningful words.

🧮 Feature Engineering — TF-IDF

Why TF-IDF?
It represents each tweet as a numerical vector based on word importance.

[ TFIDF(w) = TF(w) \times \log\left(\frac{N}{df(w)}\right) ]

Used TfidfVectorizer(max_features=5000, ngram_range=(1,2)) for best balance between accuracy and speed.

🤖 Model Training

Model	Description	Accuracy (CV)
Logistic Regression	Simple & effective for text data	✅ Best
SVM	Handles high-dimensional data	Good
Random Forest	Captures non-linear patterns	Moderate

Performed:

5-Fold Cross-Validation
GridSearchCV for hyperparameter tuning
Evaluation Metrics: Accuracy, Precision, Recall, F1-score

💾 Model Saving

Used joblib to persist model and TF-IDF vectorizer:

joblib.dump(model, 'sentiment_model.pkl')
joblib.dump(tfidf, 'tfidf_vectorizer.pkl')

Streamlit Web App

Simple, interactive web app for real-time predictions.

Run locally:

streamlit run app.py

App Flow:

Input tweet text 📝
Clean & preprocess
Convert text → TF-IDF vector
Predict sentiment using model
Display result (😊 Positive / 😠 Negative)

## 🐳 Docker Integration

docker build -t sentiment-app .
docker run -p 8501:8501 sentiment-app

📊 Results

Logistic Regression achieved ~85% accuracy on validation data
Clean UI for sentiment prediction
Fully automated CI/CD pipeline with Docker integration

Key Takeaways

Built a complete ML workflow: from preprocessing → training → deployment
Learned to ensure preprocessing consistency between training & inference
Containerized the app for reproducibility
Automated CI/CD with GitHub Actions
Gained experience with MLOps fundamentals

Setup Instructions

## Clone repo
git clone https://github.com//sentiment-analysis.git
cd sentiment-analysis
# Install dependencies
pip install -r requirements.txt
# Run Streamlit app
streamlit run app.py

or run in Docker:

docker-compose up --build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

💬 Twitter Sentiment Analysis App

🚀 Features

🧩 Project Structure

🧠 Tech Stack

🧹 Data Preprocessing

🧮 Feature Engineering — TF-IDF

🤖 Model Training

💾 Model Saving

Streamlit Web App

Simple, interactive web app for real-time predictions.

Run locally:

App Flow:

📊 Results

Key Takeaways

Setup Instructions

or run in Docker:

Author

Sameer Chauhan

MLOps & Machine Learning Engineer

💼 Passionate about bridging ML with real-world deployment through Docker, CI/CD, and automation.

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
src		src
Docker-compose.yml		Docker-compose.yml
Dockerfile		Dockerfile
README.md		README.md
dashboard.py		dashboard.py
manifest.yml		manifest.yml
requirements.txt		requirements.txt

SamIeer/SentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

💬 Twitter Sentiment Analysis App

🚀 Features

🧩 Project Structure

🧠 Tech Stack

🧹 Data Preprocessing

🧮 Feature Engineering — TF-IDF

🤖 Model Training

💾 Model Saving

Streamlit Web App

Simple, interactive web app for real-time predictions.

Run locally:

App Flow:

📊 Results

Key Takeaways

Setup Instructions

or run in Docker:

Author

Sameer Chauhan

MLOps & Machine Learning Engineer

💼 Passionate about bridging ML with real-world deployment through Docker, CI/CD, and automation.

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages