-
Notifications
You must be signed in to change notification settings - Fork 1.4k
add hume tts integration #2005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add hume tts integration #2005
Conversation
| @@ -4,5 +4,7 @@ | |||
| # SPDX-License-Identifier: BSD 2-Clause License | |||
| # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can leave this file blank. The pattern is to import from:
from pipecat.services.hume.tts import HumeTTSService
in application code.
|
|
||
|
|
||
| class HumeTTSService(TTSService): | ||
| class UtteranceParams: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe define this outside of the HumeTTSService class for ease of use?
| api_key: str, | ||
| aiohttp_session: aiohttp.ClientSession, | ||
| utterance_params: Optional[UtteranceParams] = None, | ||
| format_type: str = "mp3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you support wav?
| aiohttp_session: aiohttp.ClientSession, | ||
| utterance_params: Optional[UtteranceParams] = None, | ||
| format_type: str = "mp3", | ||
| sample_rate: int = 48000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to se the sample rate here. Instead, you want it to be set to None.
I wrote the following guidance recently on how this works:
### Sample Rate Handling
Sample rates are set via PipelineParams and passed to each frame processor at initialization. The pattern is to _not_ set the sample rate value in the constructor of a given service. Instead, use the `start()` method to initialize sample rates from the frame:
```python
async def start(self, frame: StartFrame):
"""Start the service."""
await super().start(frame)
self._settings["output_format"]["sample_rate"] = self.sample_rate
await self._connect()
Note that self.sample_rate is a @property set in the TTSService base class, which provides access to the private sample rate value obtained from the StartFrame.
Check out other TTS services to see how the pattern is applied.
| sample_rate: int = 48000, | ||
| **kwargs, | ||
| ): | ||
| super().__init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an HTTP based TTS service, you want the following:
super().__init__(sample_rate=sample_rate, **kwargs)
- aggregate_sentences: defaults to True, so no need to set
- push_text_frames: the TTS service should push text frames downstream. This is required for context aggregation. Default is True.
- push_stop_frames: You want the HumeTTSService to yield the frame instead. False is the default and desired value.
| self._session = aiohttp_session | ||
| self._utterance_params = utterance_params or self.UtteranceParams() | ||
| self._format_type = format_type | ||
| self._base_url = "https://api.hume.ai/v0/tts/stream/json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other services, the base_url is an arg. It could make sense here, too, for flexibility.
| self._utterance_params = utterance_params or self.UtteranceParams() | ||
| self._format_type = format_type | ||
| self._base_url = "https://api.hume.ai/v0/tts/stream/json" | ||
| self._cumulative_time = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove since this is unused.
| self._cumulative_time = 0 |
| self._format_type = format_type | ||
| self._base_url = "https://api.hume.ai/v0/tts/stream/json" | ||
| self._cumulative_time = 0 | ||
| self._started = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove since this is unused.
| self._started = False |
|
|
||
| yield TTSStartedFrame() | ||
|
|
||
| async for line in response.content.iter_chunked(1024 * 1024): # 1MB chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the TTSService class, there's a chunk_size @property.
@property
def chunk_size(self) -> int:
"""Get the recommended chunk size for audio streaming.
This property indicates how much audio we download (from TTS services
that require chunking) before we start pushing the first audio
frame. This will make sure we download the rest of the audio while audio
is being played without causing audio glitches (specially at the
beginning). Of course, this will also depend on how fast the TTS service
generates bytes.
Returns:
The recommended chunk size in bytes.
"""
CHUNK_SECONDS = 0.5
return int(self.sample_rate * CHUNK_SECONDS * 2) # 2 bytes/sample
We've found this work well to avoid audio glitches in playback. It's helpful to use the property so we can uniformly adjust all HTTP based services.
| @@ -0,0 +1,38 @@ | |||
| import asyncio | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a test, can you write a foundational example. Follow the 07-interruptible pattern.
|
Nice start! Sorry for the delayed feedback. We've been busy. Four other things to do:
|
|
Oops. Closing in favor of: |
TTS request is valid but missing output file when running
test_hume_tts.py