PreRequist :- make sure you have uploaded the "mail_data.csv" in the google collab from the repository link to run this file
This project demonstrates how to build a spam detection system using Logistic Regression in Python. It involves the following steps:
Data Collection and Preprocessing:
Load email data from a CSV file. Handle missing values. Perform label encoding to represent spam as 0 and ham as 1. Feature Extraction:
Utilize TF-IDF vectorization to convert text data into numerical features. Remove stop words and convert text to lowercase for better feature representation. Model Training:
Train a Logistic Regression model on the labeled data. Model Evaluation:
Evaluate the model's performance on both training and testing datasets using metrics such as accuracy.
Accuracy on training data = 96.70%
Accuracy on test data = 96.59%
Prediction:
Build a system to predict whether new emails are spam or ham based on the trained model. This project showcases a basic implementation of a spam detection system using machine learning.