Conversational AI Hackathon @ Edinburgh University 🎙️🤖

11th November 2024

Welcome to the Conversational AI Hackathon, hosted at Edinburgh University! 🚀 This event is your opportunity to dive into the exciting world of Conversational AI and build real-time, voice-driven applications that tackle innovative challenges.

Email {sohaib, jiameng, rachel}@neuphonic.com if you need any help!

Challenge

Build a real-time conversational AI solution that delivers seamless, voice-driven interactions for innovative use cases. Your goal is to combine state-of-the-art components to create a functional, impactful system.

Judging Criteria

Your project will be judged based on the following criteria:

Functionality (15%)
- How well does the solution perform its intended task?
- Does the conversational AI respond appropriately and handle various inputs effectively?
Innovation & Creativity (40%)
- Is the idea unique, or does it improve upon existing solutions?
- Does it demonstrate creative use of conversational AI technology?
User Experience (15%)
- Is the AI interaction intuitive and engaging for users?
- Are the responses natural and contextually relevant?
Impact & Applicability (30%)
- How well does the solution address a real-world problem?
- Can the project be scaled or adapted for broader use cases?

Introduction

The hackathon is designed to give you hands-on experience with:

Automatic Speech Recognition (ASR): Using Whisper to transcribe speech.
Text-to-Speech (TTS): Utilizing Neuphonic's API for voice synthesis.
Large Language Models (LLMs): Building conversational AI with lightweight, high-performance models.

Setup

Prerequisites

You will need Python3.8+, pip and Ollama with llama3.1:latest installed.

Ollama is a tool that you can use to run LLMs locally on your laptop. To download Ollama, use the link above. Then start the application, and enter the command ollama run llama3.1:latest in your terminal to download and run the llama:3.1 8b model locally.

Installation

Clone the repository:

git clone https://github.com/neuphonic/edinburgh_hackathon.git
cd edinburgh_hackathon

Create a virtual environment and install the dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Project Structure

├── whisper_speech_recognition.py   # ASR module
├── neuphonic_texttospeech.py       # TTS module
├── ollama_llm.py                   # LLM chat module
├── main.py                         # Main integration program
├── README.md                       # Documentation
└── requirements.txt                # Dependencies

Code Overview

Whisper ASR

The Whisper ASR module converts real-time speech to text.

Key Functionality:

speech_recognition(): Streams and transcribes audio in real-time, detecting sentence completions.

Test this out with

python whisper_speech_recognition.py

Text-to-Speech (TTS)

The Text-to-Speech module leverages Neuphonic’s API for generating high-quality audio.

Key Functionality:

neuphonic_tts(input_text): Converts input text into speech and plays it.

Test this out with

python neuphonic_texttospeech.py

Large Language Model (LLM)

The LLM module provides conversational responses using a lightweight language model.

Key Functionality:

language_model_chat(user_input): Processes conversational context and generates concise, friendly replies.

Test this out with

python ollama_llm.py

Main Program

The Main program integrates ASR, TTS, and LLM for a seamless conversational experience:

Transcribe speech, generate responses, and convert them to speech.
Engage in real-time conversation.

How to Run

Start the main program:

   python main.py

Interaction Flow

Speak into the microphone, and the system will transcribe your speech in real time.
The transcribed text is sent to the LLM to generate a response.
The response is converted to speech using the TTS module and played back to you.
Repeat the process to continue the conversation.

Challenges & Ideas

Challenges

Real-time performance: Ensure smooth, low-latency interactions.
Robustness: Handle varied accents, speech rates, and noisy environments.

Project Ideas

Virtual Assistant: Build a personalized voice assistant.
Interactive Learning: Develop a language learning app.
Accessibility Tool: Create tools for users with disabilities.
News Summarisation: Fetch the latest news, generates concise summaries, and delivers them as personalized audio clips.
Dynamic Storytelling: Create interactive audiobooks, with the story adapting based on mood or context.
TTS Fitness Coach: Cirtual fitness coach that provides real-time, motivational voice instructions during workouts.
AI Audioguides:Design a tool for generating personalized audioguides for museums or attractions.

Contribution Guidelines

All contributions during the hackathon should be:

Clearly documented.
Tested to ensure compatibility with the main system.

License

This project is licensed under the MIT License. See LICENSE for more details.

Happy Hacking! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Conversational AI Hackathon @ Edinburgh University 🎙️🤖

11th November 2024

Challenge

Judging Criteria

Table of Contents

Introduction

Setup

Prerequisites

Installation

Project Structure

Code Overview

Whisper ASR

Text-to-Speech (TTS)

Large Language Model (LLM)

Main Program

How to Run

Interaction Flow

Challenges & Ideas

Challenges

Project Ideas

Contribution Guidelines

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_setup.md		api_setup.md
main.py		main.py
neuphonic_texttospeech.py		neuphonic_texttospeech.py
ollama_llm.py		ollama_llm.py
requirements.txt		requirements.txt
whisper_speech_recognition.py		whisper_speech_recognition.py

License

michaeldunn99/edinburgh_hackathon

Folders and files

Latest commit

History

Repository files navigation

Conversational AI Hackathon @ Edinburgh University 🎙️🤖

11th November 2024

Challenge

Judging Criteria

Table of Contents

Introduction

Setup

Prerequisites

Installation

Project Structure

Code Overview

Whisper ASR

Text-to-Speech (TTS)

Large Language Model (LLM)

Main Program

How to Run

Interaction Flow

Challenges & Ideas

Challenges

Project Ideas

Contribution Guidelines

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages