Speech To Speech Streaming · ElevenLabs API Documentation

provider speech-to-speech POST /v1/speech-to-speech/{voice_id}/stream

@utdk/elevenlabs /v1/speech-to-speech/{voice_id}/stream

Speech To Speech Streaming

Stream audio from one voice to another. Maintain full control over emotion, timing and delivery.

voice_id path required: Voice ID to be used, you can use https://api.elevenlabs.io/v1/voices to list all the available voices.; string
enable_logging query: When enable_logging is set to false zero retention mode will be used for the request. This will mean history features are unavailable for this request, including request stitching. Zero retention mode may only be used by enterprise customers.; boolean
optimize_streaming_latency query: You can turn on latency optimizations at some cost of quality. The best possible final latency varies by model. Possible values: 0 - default mode (no latency optimizations) 1 - normal latency optimizations (about 50% of possible latency improvement of option 3) 2 - strong latency optimizations (about 75% of possible latency improvement of option 3) 3 - max latency optimizations 4 - max latency optimizations, but also with text normalizer turned off for even more latency savings (best latency, but can mispronounce eg numbers and dates). Defaults to None.
output_format query: Output format of the generated audio. Formatted as codec_sample_rate_bitrate. So an mp3 with 22.05kHz sample rate at 32kbs is represented as mp3_22050_32. MP3 with 192kbps bitrate requires you to be subscribed to Creator tier or above. PCM with 44.1kHz sample rate requires you to be subscribed to Pro tier or above. Note that the μ-law format (sometimes written mu-law, often approximated as u-law) is commonly used for Twilio audio inputs.; enum: mp3_22050_32, mp3_24000_48, mp3_44100_32, mp3_44100_64…
xi-api-key header: Your API key. This is required by most endpoints to access our API programmatically. You can view your xi-api-key using the 'Profile' tab on the website.

Try it

Authentication

Configure credentials for ElevenLabs API Documentation

Gateway

The gateway proxies requests and injects credentials server-side. Configure credentials above, then enter your gateway URL.

Execution Mode

Gateway URL

Saved automatically to browser storage.

speechToSpeechStream

POST/v1/speech-to-speech/{voice_id}/stream

Stream audio from one voice to another. Maintain full control over emotion, timing and delivery.

Parameters

voice_idrequired

Voice ID to be used, you can use https://api.elevenlabs.io/v1/voices to list all the available voices.

enable_logging

When enable_logging is set to false zero retention mode will be used for the request. This will mean history features are unavailable for this request, including request stitching. Zero retention mode may only be used by enterprise customers.

true

optimize_streaming_latency

You can turn on latency optimizations at some cost of quality. The best possible final latency varies by model. Possible values: 0 - default mode (no latency optimizations) 1 - normal latency optimizations (about 50% of possible latency improvement of option 3) 2 - strong latency optimizations (about 75% of possible latency improvement of option 3) 3 - max latency optimizations 4 - max latency optimizations, but also with text normalizer turned off for even more latency savings (best latency, but can mispronounce eg numbers and dates). Defaults to None.

output_format

Output format of the generated audio. Formatted as codec_sample_rate_bitrate. So an mp3 with 22.05kHz sample rate at 32kbs is represented as mp3_22050_32. MP3 with 192kbps bitrate requires you to be subscribed to Creator tier or above. PCM with 44.1kHz sample rate requires you to be subscribed to Pro tier or above. Note that the μ-law format (sometimes written mu-law, often approximated as u-law) is commonly used for Twilio audio inputs.

Enter a gateway URL above to enable sending.

Code snippet

Updates live as you fill in the form above.

TypeScript

import elevenlabs from '@utdk/elevenlabs';

await elevenlabs.speechToSpeechStream({
  "enable_logging": true,
  "output_format": "mp3_44100_128"
})