Skip to content

Conversation

@ingridxx
Copy link

TTS request is valid but missing output file when running test_hume_tts.py

@@ -4,5 +4,7 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave this file blank. The pattern is to import from:

from pipecat.services.hume.tts import HumeTTSService

in application code.



class HumeTTSService(TTSService):
class UtteranceParams:
Copy link
Contributor

@markbackman markbackman Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe define this outside of the HumeTTSService class for ease of use?

api_key: str,
aiohttp_session: aiohttp.ClientSession,
utterance_params: Optional[UtteranceParams] = None,
format_type: str = "mp3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you support wav?

aiohttp_session: aiohttp.ClientSession,
utterance_params: Optional[UtteranceParams] = None,
format_type: str = "mp3",
sample_rate: int = 48000,
Copy link
Contributor

@markbackman markbackman Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to se the sample rate here. Instead, you want it to be set to None.

I wrote the following guidance recently on how this works:

### Sample Rate Handling

Sample rates are set via PipelineParams and passed to each frame processor at initialization. The pattern is to _not_ set the sample rate value in the constructor of a given service. Instead, use the `start()` method to initialize sample rates from the frame:

```python
async def start(self, frame: StartFrame):
    """Start the service."""
    await super().start(frame)
    self._settings["output_format"]["sample_rate"] = self.sample_rate
    await self._connect()

Note that self.sample_rate is a @property set in the TTSService base class, which provides access to the private sample rate value obtained from the StartFrame.


Check out other TTS services to see how the pattern is applied.

sample_rate: int = 48000,
**kwargs,
):
super().__init__(
Copy link
Contributor

@markbackman markbackman Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an HTTP based TTS service, you want the following:

super().__init__(sample_rate=sample_rate, **kwargs)
  • aggregate_sentences: defaults to True, so no need to set
  • push_text_frames: the TTS service should push text frames downstream. This is required for context aggregation. Default is True.
  • push_stop_frames: You want the HumeTTSService to yield the frame instead. False is the default and desired value.

self._session = aiohttp_session
self._utterance_params = utterance_params or self.UtteranceParams()
self._format_type = format_type
self._base_url = "https://api.hume.ai/v0/tts/stream/json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other services, the base_url is an arg. It could make sense here, too, for flexibility.

self._utterance_params = utterance_params or self.UtteranceParams()
self._format_type = format_type
self._base_url = "https://api.hume.ai/v0/tts/stream/json"
self._cumulative_time = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove since this is unused.

Suggested change
self._cumulative_time = 0

self._format_type = format_type
self._base_url = "https://api.hume.ai/v0/tts/stream/json"
self._cumulative_time = 0
self._started = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove since this is unused.

Suggested change
self._started = False


yield TTSStartedFrame()

async for line in response.content.iter_chunked(1024 * 1024): # 1MB chunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the TTSService class, there's a chunk_size @property.

    @property
    def chunk_size(self) -> int:
        """Get the recommended chunk size for audio streaming.

        This property indicates how much audio we download (from TTS services
        that require chunking) before we start pushing the first audio
        frame. This will make sure we download the rest of the audio while audio
        is being played without causing audio glitches (specially at the
        beginning). Of course, this will also depend on how fast the TTS service
        generates bytes.

        Returns:
            The recommended chunk size in bytes.
        """
        CHUNK_SECONDS = 0.5
        return int(self.sample_rate * CHUNK_SECONDS * 2)  # 2 bytes/sample

We've found this work well to avoid audio glitches in playback. It's helpful to use the property so we can uniformly adjust all HTTP based services.

@@ -0,0 +1,38 @@
import asyncio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a test, can you write a foundational example. Follow the 07-interruptible pattern.

@markbackman
Copy link
Contributor

Nice start! Sorry for the delayed feedback. We've been busy.

Four other things to do:

  1. Update the README with your service.
  2. Add env var to the env.example.
  3. Add docs: https://github.com/pipecat-ai/docs
  4. Make sure to lint your code. You can install the pre-commit hook using uv run pre-commit install from the base of the repo.

@markbackman
Copy link
Contributor

Oops. Closing in favor of:
#2518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants