-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Hume tts service #2518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hume tts service #2518
Conversation
src/pipecat/services/hume/tts.py
Outdated
| using the Python SDK and emits `TTSAudioRawFrame`s suitable for Pipecat transports. | ||
| Parameters | ||
| ---------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the docstrings convention guide:
https://github.com/pipecat-ai/pipecat/blob/main/CONTRIBUTING.md#code-style-and-documentation
|
|
||
| try: | ||
| # Instant mode is always enabled here (not user-configurable) | ||
| async for chunk in self._client.tts.synthesize_json_streaming( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the TTSService class, there's a chunk_size @property.
@property
def chunk_size(self) -> int:
"""Get the recommended chunk size for audio streaming.
This property indicates how much audio we download (from TTS services
that require chunking) before we start pushing the first audio
frame. This will make sure we download the rest of the audio while audio
is being played without causing audio glitches (specially at the
beginning). Of course, this will also depend on how fast the TTS service
generates bytes.
Returns:
The recommended chunk size in bytes.
"""
CHUNK_SECONDS = 0.5
return int(self.sample_rate * CHUNK_SECONDS * 2) # 2 bytes/sample
We've found this work well to avoid audio glitches in playback. It's helpful to use the property so we can uniformly adjust all HTTP based services.
|
Generally looks good! Can you also add an example, following the 07-interruptible pattern? Also, add this to the evals list here: We use the foundational examples for evals and they're also helpful discovery points for developers trying out services. If you haven't done so already, make sure to lint your code. You can install the pre-commit hook using Last two things:
|
|
@zgreathouse I've addressed the feedback from @markbackman and created this PR: zgreathouse#1 We need to troubleshoot the example as I see all the responses in terminal, but not in Pipecat UI in browser. Once that's fixed, we should be good to go (I hope) |
|
Can you please rebase the PR to resolve the conflicts?
To get text to appear in the console, you need to add an RTVIProcessor and observer. You can see that in use in the quickstart bot file: https://github.com/pipecat-ai/pipecat/blob/main/examples/quickstart/bot.py#L84-L105 We haven't included RTVI for these examples to keep them simple, so this is probably a non-issue. I'll review the PR shortly. Thanks for the quick fixes! |
pyproject.toml
Outdated
| webrtc = [ "aiortc~=1.11.0", "opencv-python~=4.11.0.86" ] | ||
| websocket = [ "websockets>=13.1,<15.0", "fastapi>=0.115.6,<0.117.0" ] | ||
| whisper = [ "faster-whisper~=1.1.1" ] | ||
| fastapi = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this addition? I think it can be removed.
| """ | ||
| return self._sample_rate | ||
|
|
||
| @property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this removed intentionally? This should remain.
| audio_in_enabled=True, | ||
| audio_out_enabled=True, | ||
| vad_analyzer=SileroVADAnalyzer(), | ||
| audio_out_sample_rate=HUME_SAMPLE_RATE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove. Sample rates should be set in the PipelineParams, not in individual services.
| pipeline, | ||
| params=PipelineParams( | ||
| enable_metrics=True, | ||
| enable_usage_metrics=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set the sample rate here.
| enable_usage_metrics=True, | |
| enable_usage_metrics=True, | |
| audio_out_sample_rate=HUME_SAMPLE_RATE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will set the sample rate to HUME_SAMPLE_RATE for all processors that output audio.
| from loguru import logger | ||
|
|
||
| from pipecat.audio.vad.silero import SileroVADAnalyzer | ||
| from pipecat.frames.frames import StartFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove. Unused.
src/pipecat/services/hume/tts.py
Outdated
| yield TTSStoppedFrame() | ||
|
|
||
|
|
||
| __all__ = ["HumeTTSService"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this.
| __all__ = ["HumeTTSService"] |
The pattern is to import as:
from pipecat.services.hume.tts import HumeTTSService
src/pipecat/services/hume/tts.py
Outdated
| ) | ||
|
|
||
| super().__init__( | ||
| pause_frame_processing=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove. This should be: super().__init__(sample_rate=sample_rate, **kwargs)
| pause_frame_processing=True, |
src/pipecat/services/hume/tts.py
Outdated
| # Request raw PCM chunks in the streaming JSON | ||
| pcm_fmt = FormatPcm(type="pcm") | ||
|
|
||
| measuring_ttfb = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checks around measuring_ttfb aren't needed. You can remove this variable and the if checks.
src/pipecat/services/hume/tts.py
Outdated
|
|
||
| """Hume Text-to-Speech service implementation.""" | ||
|
|
||
| from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
| from __future__ import annotations |
src/pipecat/services/hume/tts.py
Outdated
| pcm_bytes = base64.b64decode(audio_b64) | ||
| self._audio_bytes += pcm_bytes | ||
|
|
||
| # Send the first audio chunk immediately to avoid client-side delays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to remove lines 208-216. This is causing duplicate initial audio to be spoken. In removing it, it solves the duplicate issue I was seeing running this file verbatim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the corresponding first_audio_sent variable on line 194.
src/pipecat/services/hume/tts.py
Outdated
| logger.exception(f"{self} error generating TTS: {e}") | ||
| yield ErrorFrame(error=str(e)) | ||
| finally: | ||
| # Yield any remaining audio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to remove this too (lines 230-231). Pipecat takes care of sending audio. You just need to yield TTSAudioRawFrames as you do above.
src/pipecat/services/hume/tts.py
Outdated
| except ModuleNotFoundError as e: # pragma: no cover - import-time guidance | ||
| logger.error(f"Exception: {e}") | ||
| logger.error("In order to use Hume, you need to `pip install pipecat-ai[hume]`.") | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be:
| raise | |
| raise Exception(f"Missing module: {e}") |
| from pipecat.services.hume.tts import HUME_SAMPLE_RATE, HumeTTSService | ||
| from pipecat.services.openai.llm import OpenAILLMService | ||
| from pipecat.transports.base_transport import BaseTransport, TransportParams | ||
| from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update import path:
| from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams | |
| from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams |
| from pipecat.services.openai.llm import OpenAILLMService | ||
| from pipecat.transports.base_transport import BaseTransport, TransportParams | ||
| from pipecat.transports.network.fastapi_websocket import FastAPIWebsocketParams | ||
| from pipecat.transports.services.daily import DailyParams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update import path:
| from pipecat.transports.services.daily import DailyParams | |
| from pipecat.transports.daily.transport import DailyParams |
| }, | ||
| ] | ||
|
|
||
| context = OpenAILLMContext(messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about this. We just changed the pattern for this. To avoid a deprecation warning, use:
context = LLMContext(messages)
context_aggregator = LLMContextAggregatorPair(context)
Import paths are:
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
| logger.info(f"Client connected") | ||
| # Kick off the conversation. | ||
| messages.append({"role": "system", "content": "Please introduce yourself to the user."}) | ||
| await task.queue_frames([context_aggregator.user().get_context_frame()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the LLMContext change above, you need:
| await task.queue_frames([context_aggregator.user().get_context_frame()]) | |
| await task.queue_frames([LLMRunFrame()]) |
LLMRunFrame is imported from:
from pipecat.frames.frames import LLMRunFrame
|
|
||
| async def bot(runner_args: RunnerArguments): | ||
| """Main bot entry point compatible with Pipecat Cloud.""" | ||
| runner_args.transport = "webrtc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
You can run foundational examples using the Pipecat development runner, which takes command line args:
- SmallWebRTCTransport:
uv run 07ad-interruptible-hume.py - DailyTransport:
uv run 07ad-interruptible-hume.py --transport daily - SmallWebRTC:
uv run 07ad-interruptible-hume.py --transport twilio --proxy YOUR_NGROK_URL
Let's stick to the pattern in this example, so that using these are uniform.
| runner_args.transport = "webrtc" |
| @@ -0,0 +1,124 @@ | |||
| # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
07ad has been taken. Let's rename to 07ae.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you 🙌
All that remains is to rebase on the latest main and add a changelog entry. Also, make sure the code is linted. You can install the pre-commit hook (uv run pre-commit install) or run the ./scripts/fix-ruff.sh script to clean up.
137bbe8 to
c1492c5
Compare
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
0.11.2) dependency for service