Airbnb-NYC-Listings-Data-Analysis

This project analyzes a dataset of Airbnb listings in NYC to identify popular neighborhoods, busiest hosts, and predict rental prices using various regression models.

Project Goals Analyze Popular Neighborhoods: Identify the busiest hosts and their top neighborhood groups in terms of listings and price. Predict Rental Prices: Build and compare regression models to predict Airbnb rental prices.

Data Analysis and Key Findings

The dataset contains information about Airbnb listings in NYC, including details about the host, location, room type, price, and availability.

Data Cleaning: Missing values in 'name', 'host_name', and 'reviews_per_month' were handled, and the 'last_review' column was dropped due to a large number of missing values.

Outlier Treatment: Extreme values in numerical columns such as 'price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count', and 'availability_365' were capped using the Interquartile Range (IQR) method.

Busiest Host: The host with the most listings is "Sonder (NYC)" with 327 listings, primarily in Manhattan.

Popular Neighborhoods: Manhattan and Brooklyn have the highest number of listings, which correlates with higher rental prices in Manhattan, likely due to its popularity as a tourist destination.

Price Prediction

The project explored several regression models to predict the price of Airbnb listings:

Linear Regression: A baseline model was trained, and its performance was evaluated using cross-validation with different numbers of folds (k=3, 5, and 10).

Other Regression Models: Support Vector Regressor (SVR), Random Forest Regressor, and Gradient Boosting Regressor were trained and compared.

Model Performance Comparison (Mean Squared Error - MSE)

Random Forest Regressor: MSE: 2964.89, RMSE: 54.45

Gradient Boosting Regressor: MSE: 3098.93, RMSE: 55.67

Support Vector Regressor: MSE: 7287.29, RMSE: 85.37

Conclusion

The Random Forest Regressor performed best among the evaluated models, achieving the lowest MSE and RMSE. This suggests that tree-based ensemble methods are well-suited for this price prediction task. Further improvements could potentially be made by hyperparameter tuning the Random Forest model.

Files

AB_NYC_2019.csv: The dataset used for this analysis.

This Notebook: Contains the code for data loading, analysis, visualization, and model training/evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
AB_NYC_2019.csv		AB_NYC_2019.csv
Airbnb_NYC_Listings_EDA.ipynb		Airbnb_NYC_Listings_EDA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Airbnb-NYC-Listings-Data-Analysis

About

Uh oh!

Releases

Packages

Languages

super-sg/Airbnb-NYC-Listings-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Airbnb-NYC-Listings-Data-Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages