Streaming Text-to-Speech

Overview

Streaming TTS generates speech incrementally, delivering audio chunks as they are produced. This enables lower time-to-first-byte and immediate playback while synthesis continues.

Quick Start

import { createStreamingTTS } from 'react-native-sherpa-onnx/tts';

// 1) Create streaming TTS engine
const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  modelType: 'vits',
});

// 2) Generate speech with streaming callbacks
const controller = await tts.generateSpeechStream(
  'Hello, this is streaming TTS.',
  undefined,
  {
    onChunk: (chunk) => {
      // chunk.samples: float[] in [-1, 1]
      // chunk.sampleRate: number
      // chunk.progress: 0..1
      // chunk.isFinal: boolean
      playAudio(chunk.samples, chunk.sampleRate);
    },
    onEnd: () => console.log('Generation complete'),
    onError: (err) => console.error('Error:', err.message),
  }
);

// 3) Cleanup
await tts.destroy();

Built-in PCM Player

Use the native PCM player for minimal latency:

const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1); // mono

const controller = await tts.generateSpeechStream(
  'Hello, world!',
  undefined,
  {
    onChunk: (chunk) => {
      if (chunk.samples.length > 0) {
        tts.writePcmChunk(chunk.samples);
      }
    },
    onEnd: () => tts.stopPcmPlayer(),
    onError: () => tts.stopPcmPlayer(),
  }
);

Engine Creation

Create a streaming TTS engine (same as batch TTS):

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  modelType: 'auto',  // or explicit: 'vits', 'matcha', etc.
  
  // Performance
  numThreads: 4,
  provider: 'cpu',
  
  // Model options
  modelOptions: {
    vits: {
      noiseScale: 0.667,
      noiseScaleW: 0.8,
      lengthScale: 1.0,
    },
  },
  
  // Config-level options
  maxNumSentences: 1,    // Sentences per callback
  silenceScale: 0.2,
});

Generate Speech Stream

const controller = await tts.generateSpeechStream(
  text,
  options,  // TtsGenerationOptions or undefined
  handlers  // TtsStreamHandlers
);

Generation Options

Same as batch TTS:

const controller = await tts.generateSpeechStream(
  'Hello, world!',
  {
    sid: 0,        // Speaker ID
    speed: 1.2,    // Speed multiplier
    silenceScale: 0.3,
  },
  handlers
);

Stream Handlers

interface TtsStreamHandlers {
  onChunk?: (chunk: TtsStreamChunk) => void;
  onEnd?: (event: TtsStreamEnd) => void;
  onError?: (event: TtsStreamError) => void;
}

Chunk Event

interface TtsStreamChunk {
  instanceId?: string;
  requestId?: string;
  samples: number[];    // Float PCM in [-1, 1]
  sampleRate: number;   // Sample rate in Hz
  progress: number;     // 0..1
  isFinal: boolean;     // True for last chunk
}

End Event

interface TtsStreamEnd {
  instanceId?: string;
  requestId?: string;
  cancelled: boolean;   // True if cancelled
}

Error Event

interface TtsStreamError {
  instanceId?: string;
  requestId?: string;
  message: string;
}

Stream Controller

The controller manages the streaming generation:

interface TtsStreamController {
  cancel: () => Promise<void>;    // Stop generation
  unsubscribe: () => void;         // Remove listeners
}

Cancel Generation

const controller = await tts.generateSpeechStream(text, undefined, handlers);

// User taps "Stop"
await controller.cancel();

Unsubscribe Listeners

// Automatically called after onEnd/onError
// Manually call if discarding controller early
controller.unsubscribe();

PCM Player API

Start Player

const sampleRate = await tts.getSampleRate();
const numChannels = 1; // mono

await tts.startPcmPlayer(sampleRate, numChannels);

Write Chunks

onChunk: (chunk) => {
  // Samples must be in [-1, 1]
  await tts.writePcmChunk(chunk.samples);
}

Stop Player

await tts.stopPcmPlayer();

Complete Example

import { createStreamingTTS } from 'react-native-sherpa-onnx/tts';

async function streamSpeech(text: string) {
  const tts = await createStreamingTTS({
    modelPath: { type: 'asset', path: 'models/vits-piper-en_US' },
    modelType: 'vits',
    numThreads: 4,
  });
  
  try {
    const sampleRate = await tts.getSampleRate();
    await tts.startPcmPlayer(sampleRate, 1);
    
    const controller = await tts.generateSpeechStream(
      text,
      { speed: 1.0 },
      {
        onChunk: (chunk) => {
          console.log(`Progress: ${(chunk.progress * 100).toFixed(0)}%`);
          if (chunk.samples.length > 0) {
            tts.writePcmChunk(chunk.samples);
          }
        },
        onEnd: (e) => {
          tts.stopPcmPlayer();
          if (e.cancelled) {
            console.log('Generation cancelled');
          } else {
            console.log('Generation complete');
          }
        },
        onError: (err) => {
          tts.stopPcmPlayer();
          console.error('TTS error:', err.message);
        },
      }
    );
    
    // Return controller for potential cancellation
    return controller;
  } finally {
    // Cleanup after generation completes
    await tts.destroy();
  }
}

// Usage
const controller = await streamSpeech('Hello, world!');

// Cancel if needed
// await controller.cancel();

Recording Streamed Audio

Accumulate chunks to save after generation:

const chunks: number[] = [];
let sampleRate = 0;

const controller = await tts.generateSpeechStream(text, undefined, {
  onChunk: (chunk) => {
    sampleRate = chunk.sampleRate;
    chunks.push(...chunk.samples);
    
    // Also play live
    tts.writePcmChunk(chunk.samples);
  },
  onEnd: async () => {
    tts.stopPcmPlayer();
    
    // Save accumulated audio
    if (chunks.length > 0) {
      await saveAudioToFile(
        { samples: chunks, sampleRate },
        '/path/to/output.wav'
      );
    }
  },
  onError: () => tts.stopPcmPlayer(),
});

Voice Cloning (Pocket TTS)

Stream with voice cloning for Kotlin-engine models:

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/pocket-tts' },
  modelType: 'pocket',
});

const controller = await tts.generateSpeechStream(
  'Target text in cloned voice',
  {
    referenceAudio: { samples: refSamples, sampleRate: 22050 },
    referenceText: 'Reference transcript',
    numSteps: 20,
    extra: { temperature: '0.7' },
  },
  handlers
);

Note: Streaming with reference audio is not supported for ZipVoice. Use batch generateSpeech for ZipVoice voice cloning.

Multiple Concurrent Requests

Only one stream per engine is allowed at a time. For concurrent requests:

Option A: Sequential

Wait for onEnd before starting the next:

await tts.generateSpeechStream(text1, undefined, handlers1);
// Wait for onEnd...
await tts.generateSpeechStream(text2, undefined, handlers2);

Option B: Multiple Engines

Create separate engines:

const tts1 = await createStreamingTTS(config);
const tts2 = await createStreamingTTS(config);

await tts1.generateSpeechStream(text1, undefined, handlers1);
await tts2.generateSpeechStream(text2, undefined, handlers2);

await tts1.destroy();
await tts2.destroy();

Performance Tips

Threading

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper' },
  modelType: 'vits',
  numThreads: 4,  // Use multiple cores
});

Chunk Size

Control via maxNumSentences:

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper' },
  modelType: 'vits',
  maxNumSentences: 2,  // Larger chunks, less frequent callbacks
});

Memory

Avoid accumulating all chunks in JS for very long texts
Use native player to minimize JS memory usage
Save incrementally to files if needed

Error Handling

const controller = await tts.generateSpeechStream(
  text,
  undefined,
  {
    onChunk: (chunk) => playAudio(chunk.samples),
    onEnd: (e) => {
      if (!e.cancelled) {
        console.log('Success');
      }
    },
    onError: (e) => {
      console.error('TTS streaming error:', e.message);
      // Cleanup, stop playback, show error UI
    },
  }
);

Cleanup

Always clean up resources:

try {
  const tts = await createStreamingTTS({ /* ... */ });
  
  // Use streaming TTS
  const controller = await tts.generateSpeechStream(text, undefined, handlers);
  
  // Wait for completion or cancel
  // ...
} finally {
  await tts.destroy();
}

Listeners are automatically removed after onEnd or onError. Call controller.unsubscribe() manually only if discarding the controller before completion.

Supported Models

All TTS model types support streaming:

VITS (Piper)
Matcha
Kokoro
Kitten
Pocket
ZipVoice (batch generateSpeech only for voice cloning)

Get Started

Core Features

Guides

Platform Specific

Advanced

Overview

Quick Start

Built-in PCM Player

Engine Creation

Generate Speech Stream

Generation Options

Stream Handlers

Chunk Event

End Event

Error Event

Stream Controller

Cancel Generation

Unsubscribe Listeners

PCM Player API

Start Player

Write Chunks

Stop Player

Complete Example

Recording Streamed Audio

Voice Cloning (Pocket TTS)

Multiple Concurrent Requests

Option A: Sequential

Option B: Multiple Engines

Performance Tips

Threading

Chunk Size

Memory

Error Handling

Cleanup

Supported Models

Next Steps

Batch TTS

Model Setup

Get Started

Core Features

Guides

Platform Specific

Advanced

Documentation Index

​Overview

​Quick Start

​Built-in PCM Player

​Engine Creation

​Generate Speech Stream

​Generation Options

​Stream Handlers

​Chunk Event

​End Event

​Error Event

​Stream Controller

​Cancel Generation

​Unsubscribe Listeners

​PCM Player API

​Start Player

​Write Chunks

​Stop Player

​Complete Example

​Recording Streamed Audio

​Voice Cloning (Pocket TTS)

​Multiple Concurrent Requests

​Option A: Sequential

​Option B: Multiple Engines

​Performance Tips

​Threading

​Chunk Size

​Memory

​Error Handling

​Cleanup

​Supported Models

​Next Steps

Batch TTS

Model Setup

Overview

Quick Start

Built-in PCM Player

Engine Creation

Generate Speech Stream

Generation Options

Stream Handlers

Chunk Event

End Event

Error Event

Stream Controller

Cancel Generation

Unsubscribe Listeners

PCM Player API

Start Player

Write Chunks

Stop Player

Complete Example

Recording Streamed Audio

Voice Cloning (Pocket TTS)

Multiple Concurrent Requests

Option A: Sequential

Option B: Multiple Engines

Performance Tips

Threading

Chunk Size

Memory

Error Handling

Cleanup

Supported Models

Next Steps