Desktop-assistant-AI is an AI-powered desktop assistant designed to help users, especially coders, when they are unsure what to do next. It originally intended to take screenshots from PC or screen recording and provide it to the AI for analysis. It is not currently a useful product.
Platform: Windows only
- AI-powered help for coding and general desktop tasks
- Screenshot capture and context-aware assistance
- Integration with OpenAI (ChatGPT) and Whisper for speech recognition
- Text-to-speech responses
- Secure model loading with progress feedback
- Modern PyQt5 GUI
This project uses a PowerShell script to automate the setup process. It will check for the required Python version (3.11), create a virtual environment, and install all the necessary dependencies.
-
Download or clone this repository.
-
Run the setup script:
Open a PowerShell terminal and run the following command:
.\setup.ps1
The script will guide you through the setup process. If you don't have Python 3.11 installed, it will offer to install it for you from the Microsoft Store.
-
Run the application:
Once the setup is complete, you can launch the application with:
.\run.ps1
See also: gotchas.md for troubleshooting common installation issues.
After installation, launch the assistant using run.ps1
. The app will show a loading screen while the Whisper and Coqui TTS models load, then present the main window for interaction.
To set up a development environment, simply follow the installation instructions above. The setup.ps1
script will create a self-contained virtual environment in the .venv
directory, which you can use for development.
src/
— Main source coderesources/
— Images and logosrun.bat
,compile.bat
— Windows scripts for running and compilingdownload_dependencies.sh
— Dependency installer for Linux
See The nature of the security vulnerability.md
for details on a Powershell script parser vulnerability related to speculative execution in batch scripts. This project is designed with security in mind, but always review scripts before running.
MIT License (see LICENSE file if present)
- PyQt5 - The GUI framework used
- OpenAI - For ChatGPT and Whisper integration
- Coqui TTS - For text-to-speech
- PyAudio - For audio I/O
- Silero VAD - For voice activity detection
v0.2 - PyQt5 GUI with CoquiTTS.
v0.1 - Command line with pyttsx3