This project focuses on cleaning and preprocessing layoff data to ensure accuracy, consistency, and usability for further analysis. The dataset contains records of company layoffs, including details like company name, location, industry, total layoffs, and funding. The data was cleaned using SQL to remove duplicates, standardize formats, and fix inconsistencies.
The dataset includes the following key columns:
company
β Name of the companylocation
β Company locationindustry
β Industry sectortotal_laid_off
β Number of employees laid offpercentage_laid_off
β Percentage of workforce laid offdate
β Layoff datestage
β Growth stage of the companycountry
β Country of the companyfunds_raised_millions
β Funds raised by the company
- Created a staging table
layoff_stage
for processing. - Used
ROW_NUMBER()
to identify duplicate records based on company, location, industry, layoffs, and funding. - Deleted duplicate records to retain only the first occurrence.
- Removed leading/trailing spaces from
company
names. - Standardized industry names (e.g., converting
crypto%
tocrypto
). - Fixed country names (e.g., removing trailing dots from
United States.
).
- Converted
date
from text format to a proper DATE type usingSTR_TO_DATE()
. - Altered the column type in
layoff_stage3
to store date values correctly.
- Checked for
NULL
values inindustry
andtotal_laid_off
. - Removed records where both
total_laid_off
andpercentage_laid_off
wereNULL
(considered as unusable data).
- Dropped unnecessary columns like
row_num
after removing duplicates. - Ensured all transformations were applied to
layoff_stage3
, which serves as the final cleaned dataset.
- SQL (MySQL)
- MySQL Workbench
After performing all cleaning operations, the dataset is now:
β
Free from duplicates
β
Standardized in formatting
β
Ready for further analysis
- Clone this repository or copy the SQL scripts.
- Run the SQL queries in MySQL Workbench or any SQL client.
- Use the cleaned
layoff_stage3
table for analysis.
πΉ Further validation checks for incorrect entries.
πΉ Automating the cleaning process using stored procedures.
πΉ Exploring data visualization for deeper insights.
π© Questions? Suggestions? Feel free to reach out! π