add hume tts integration #2005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

ingridxx wants to merge 1 commit into pipecat-ai:main from ingridxx:ingrid/hume-tts-integration

ingridxx commented Jun 13, 2025

TTS request is valid but missing output file when running test_hume_tts.py


          add hume tts integration

e178817

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/__init__.py

    
            @@ -4,5 +4,7 @@
          
              # SPDX-License-Identifier: BSD 2-Clause License

              #

Contributor

markbackman Sep 29, 2025

You can leave this file blank. The pattern is to import from:

from pipecat.services.hume.tts import HumeTTSService

in application code.

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/tts.py

    
              class HumeTTSService(TTSService):

                  class UtteranceParams:

Contributor

markbackman Sep 29, 2025 •

edited

Loading

Nit: Maybe define this outside of the HumeTTSService class for ease of use?

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/tts.py

    
                      api_key: str,

                      aiohttp_session: aiohttp.ClientSession,

                      utterance_params: Optional[UtteranceParams] = None,

                      format_type: str = "mp3",

Contributor

markbackman Sep 29, 2025

Do you support wav?

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/tts.py

    
                      aiohttp_session: aiohttp.ClientSession,

                      utterance_params: Optional[UtteranceParams] = None,

                      format_type: str = "mp3",

                      sample_rate: int = 48000,

Contributor

markbackman Sep 29, 2025 •

edited

Loading

You don't want to se the sample rate here. Instead, you want it to be set to None.

I wrote the following guidance recently on how this works:

### Sample Rate Handling

Sample rates are set via PipelineParams and passed to each frame processor at initialization. The pattern is to _not_ set the sample rate value in the constructor of a given service. Instead, use the `start()` method to initialize sample rates from the frame:

```python
async def start(self, frame: StartFrame):
    """Start the service."""
    await super().start(frame)
    self._settings["output_format"]["sample_rate"] = self.sample_rate
    await self._connect()

Note that self.sample_rate is a @property set in the TTSService base class, which provides access to the private sample rate value obtained from the StartFrame.


Check out other TTS services to see how the pattern is applied.

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/tts.py

    
                      sample_rate: int = 48000,

                      **kwargs,

                  ):

                      super().__init__(

Contributor

markbackman Sep 29, 2025 •

edited

Loading

For an HTTP based TTS service, you want the following:

super().__init__(sample_rate=sample_rate, **kwargs)

aggregate_sentences: defaults to True, so no need to set
push_text_frames: the TTS service should push text frames downstream. This is required for context aggregation. Default is True.
push_stop_frames: You want the HumeTTSService to yield the frame instead. False is the default and desired value.

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/tts.py

    
                      self._session = aiohttp_session

                      self._utterance_params = utterance_params or self.UtteranceParams()

                      self._format_type = format_type

                      self._base_url = "https://api.hume.ai/v0/tts/stream/json"

Contributor

markbackman Sep 29, 2025

In other services, the base_url is an arg. It could make sense here, too, for flexibility.

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/tts.py

    
                      self._utterance_params = utterance_params or self.UtteranceParams()

                      self._format_type = format_type

                      self._base_url = "https://api.hume.ai/v0/tts/stream/json"

                      self._cumulative_time = 0

Contributor

markbackman Sep 29, 2025

Remove since this is unused.

Suggested change

self._cumulative_time = 0

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/tts.py

    
                      self._format_type = format_type

                      self._base_url = "https://api.hume.ai/v0/tts/stream/json"

                      self._cumulative_time = 0

                      self._started = False

Contributor

markbackman Sep 29, 2025

Remove since this is unused.

Suggested change

self._started = False

markbackman reviewed

View reviewed changes

src/pipecat/services/hume/tts.py

    
                              yield TTSStartedFrame()

                              async for line in response.content.iter_chunked(1024 * 1024):  # 1MB chunks

Contributor

markbackman Sep 29, 2025

From the TTSService class, there's a chunk_size @property.

    @property
    def chunk_size(self) -> int:
        """Get the recommended chunk size for audio streaming.

        This property indicates how much audio we download (from TTS services
        that require chunking) before we start pushing the first audio
        frame. This will make sure we download the rest of the audio while audio
        is being played without causing audio glitches (specially at the
        beginning). Of course, this will also depend on how fast the TTS service
        generates bytes.

        Returns:
            The recommended chunk size in bytes.
        """
        CHUNK_SECONDS = 0.5
        return int(self.sample_rate * CHUNK_SECONDS * 2)  # 2 bytes/sample

We've found this work well to avoid audio glitches in playback. It's helpful to use the property so we can uniformly adjust all HTTP based services.

markbackman reviewed

View reviewed changes

test_hume_tts.py

    
            @@ -0,0 +1,38 @@
          
              import asyncio

Contributor

markbackman Sep 29, 2025

Instead of a test, can you write a foundational example. Follow the 07-interruptible pattern.

Contributor

markbackman commented Sep 29, 2025

Nice start! Sorry for the delayed feedback. We've been busy.

Four other things to do:

Update the README with your service.
Add env var to the env.example.
Add docs: https://github.com/pipecat-ai/docs
Make sure to lint your code. You can install the pre-commit hook using uv run pre-commit install from the base of the repo.

Contributor

markbackman commented Sep 29, 2025

Oops. Closing in favor of:
#2518

markbackman closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet