Action Recognition using Vision Transformer (ViT)

This project implements action recognition on videos using the Vision Transformer (ViT) model. It includes a Streamlit-based web application for uploading videos, predicting actions, and visualizing Grad-CAM heatmaps.

Features

Train and evaluate a Vision Transformer (ViT) model for video classification.
Web interface for uploading videos and predicting actions.
Grad-CAM visualizations for model interpretability.

Project Structure

action-recognition-vit
├── src
│   ├── models
│   │   └── vit.py          # Implementation of the Vision Transformer model
│   ├── training
│   │   ├── train.py        # Training script for the ViT model
│   │   └── dataset.py      # Dataset class for loading and preprocessing video data
│   ├── evaluation
│   │   └── evaluate.py     # Evaluation script for assessing model performance
│   ├── web
│   │   ├── app.py          # Web application for user interaction
│   └── utils
│       └── helpers.py      # Utility functions for data processing and visualization
├── requirements.txt         # List of project dependencies
├── README.md                # Project documentation
└── .gitignore               # Files and directories to ignore in Git

Installation

Follow these steps to set up the project on your local machine:

1. Clone the Repository

Clone the repository to your local machine:

git clone https://github.com/your-repo/action-recognition-vit.git
cd action-recognition-vit

2. Set Up a Virtual Environment

Create and activate a virtual environment to manage dependencies:

# Create a virtual environment
python -m venv .venv

# Activate the virtual environment
# On Windows:
.\.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate

3. Install Dependencies

Install the required Python packages:

pip install --upgrade pip
pip install -r requirements.txt

Usage

1. Training the Model

To train the Vision Transformer model on your dataset, run the following command:

python src/training/train.py

2. Evaluating the Model

After training, evaluate the model's performance using:

python src/evaluation/evaluate.py

3. Running the Web Interface

The project includes a Streamlit-based web application for uploading videos and predicting actions.

Steps to Run the App:

Ensure the virtual environment is activated:
```
.\.venv\Scripts\activate
```
Start the Streamlit app:
```
streamlit run src/web/app.py
```
Open the URL : https://action-recognition-using-vit.streamlit.app/

Using the Streamlit App

Features of the App:

Upload Videos: Upload a video file in .mp4, .avi, or .mov format.
Action Prediction: The app predicts the action in the video using the Vision Transformer model.
Grad-CAM Visualizations: Visualize Grad-CAM heatmaps to understand which parts of the video influenced the model's predictions.

Example Workflow:

Upload a Video:
- Use the sidebar to upload a video file.
- Supported formats: .mp4, .avi, .mov.
View Uploaded Video:
- The uploaded video is displayed in the main interface.
Prediction and Visualization:
- The app extracts frames from the video and processes them through the ViT model.
- The predicted action is displayed, and Grad-CAM heatmaps are generated for interpretability.
Interact with Results:
- View Grad-CAM heatmaps for each frame to understand the model's focus areas.
- Upload another video to repeat the process.

Troubleshooting

Dependencies Not Installed: Ensure all dependencies are installed:
```
pip install -r requirements.txt
```
Streamlit App Not Starting: Ensure the virtual environment is activated and all dependencies are installed.

CUDA Issues: If using a GPU, ensure PyTorch is installed with CUDA support:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Screenshots

Predicted Action

Grad-CAM Heatmaps

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for any suggestions or improvements.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
AUG_Padding_32		AUG_Padding_32
AUG_interpolation_imgaug_32		AUG_interpolation_imgaug_32
AUG_temporalrev_framavg_32		AUG_temporalrev_framavg_32
Action-Recognition-Using-Vision-Transformers-		Action-Recognition-Using-Vision-Transformers-
Action-Recognition-Using-Vision-Transformers-.git		Action-Recognition-Using-Vision-Transformers-.git
Vit log files		Vit log files
deit_log_files		deit_log_files
fr_16		fr_16
fr_8		fr_8
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
EEEM068_6915210_report[1].pdf		EEEM068_6915210_report[1].pdf
README.md		README.md
app.py		app.py
desktop.ini		desktop.ini
requirements.txt		requirements.txt
timesformer_model.pth		timesformer_model.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Action Recognition using Vision Transformer (ViT)

Features

Project Structure

Installation

1. Clone the Repository

2. Set Up a Virtual Environment

3. Install Dependencies

Usage

1. Training the Model

2. Evaluating the Model

3. Running the Web Interface

Steps to Run the App:

Using the Streamlit App

Features of the App:

Example Workflow:

Troubleshooting

Screenshots

Predicted Action

Grad-CAM Heatmaps

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mohansree14/Action-Recognition-using-vision-transfomer

Folders and files

Latest commit

History

Repository files navigation

Action Recognition using Vision Transformer (ViT)

Features

Project Structure

Installation

1. Clone the Repository

2. Set Up a Virtual Environment

3. Install Dependencies

Usage

1. Training the Model

2. Evaluating the Model

3. Running the Web Interface

Steps to Run the App:

Using the Streamlit App

Features of the App:

Example Workflow:

Troubleshooting

Screenshots

Predicted Action

Grad-CAM Heatmaps

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages