Create an ephemeral API token for use in client-side applications with the Realtime API specifically for realtime transcriptions. Can be configured with the same session parameters as the `transcription_session.update` client event. It responds with a session object, plus a `client_secret` key which contains a usable ephemeral API token that can be used to authenticate browser clients for the Realtime API. · OpenAI API

provider Realtime POST /realtime/transcription_sessions

@utdk/openai /realtime/transcription_sessions

Create an ephemeral API token for use in client-side applications with the Realtime API specifically for realtime transcriptions. Can be configured with the same session parameters as the `transcription_session.update` client event. It responds with a session object, plus a `client_secret` key which contains a usable ephemeral API token that can be used to authenticate browser clients for the Realtime API.

Try it

Authentication

Configure credentials for OpenAI API

Gateway

The gateway proxies requests and injects credentials server-side. Configure credentials above, then enter your gateway URL.

Execution Mode

Gateway URL

Saved automatically to browser storage.

createRealtimeTranscriptionSession

POST/realtime/transcription_sessions

Input

modalities

The set of modalities the model can respond with. To disable audio, set this to ["text"].

input_audio_format

The format of input audio. Options are `pcm16`, `g711_ulaw`, or `g711_alaw`. For `pcm16`, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.

input_audio_transcription

Configuration for input audio transcription. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

turn_detection

Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to `null` to turn off, in which case the client must manually trigger model response. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech. Semantic VAD is more advanced and uses a turn detection model (in conjuction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.

input_audio_noise_reduction

Configuration for input audio noise reduction. This can be set to `null` to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

include

The set of items to include in the transcription. Current available items are: - `item.input_audio_transcription.logprobs`

Enter a gateway URL above to enable sending.

Code snippet

Updates live as you fill in the form above.

TypeScript

import openai from '@utdk/openai';

await openai.createRealtimeTranscriptionSession({
  "input_audio_format": "pcm16",
  "input_audio_noise_reduction": "null"
})