Documentation Index Fetch the complete documentation index at: https://mintlify.com/xdcobra/react-native-sherpa-onnx/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The STT module provides offline speech recognition capabilities. Create an engine with createSTT, then transcribe audio from files or float samples. Both methods return comprehensive results with text, tokens, timestamps, detected language, emotion, and events (model-dependent).
Quick Start
import { createSTT } from 'react-native-sherpa-onnx/stt' ;
import { listAssetModels } from 'react-native-sherpa-onnx' ;
// 1) Find bundled models
const models = await listAssetModels ();
// 2) Create an STT engine
const stt = await createSTT ({
modelPath: { type: 'asset' , path: 'models/sherpa-onnx-whisper-tiny-en' },
modelType: 'auto' ,
preferInt8: true ,
});
// 3) Transcribe a WAV file
const result = await stt . transcribeFile ( '/path/to/audio.wav' );
console . log ( 'Transcription:' , result . text );
// Clean up
await stt . destroy ();
Transcribe from File
Transcribe a WAV file (16 kHz mono recommended):
const result = await stt . transcribeFile ( '/path/to/audio.wav' );
console . log ( 'Text:' , result . text );
console . log ( 'Tokens:' , result . tokens );
console . log ( 'Timestamps:' , result . timestamps );
console . log ( 'Language:' , result . lang );
console . log ( 'Emotion:' , result . emotion ); // model-dependent
Result Fields
Field Type Description textstringTranscribed text tokensstring[]Token strings timestampsnumber[]Timestamps per token (model-dependent) langstringDetected or specified language emotionstringEmotion label (e.g. SenseVoice) eventstringEvent label (model-dependent) durationsnumber[]Durations for TDT models
Transcribe from Samples
Transcribe from float PCM samples (mono, [-1, 1]):
const samples : number [] = getPcmSamplesFromMic ();
const result = await stt . transcribeSamples ( samples , 16000 );
console . log ( 'Transcription:' , result . text );
Resampling is handled automatically by sherpa-onnx when the sample rate differs from the model’s expected rate.
Supported Model Types
The SDK supports multiple STT model architectures:
Model Type Description Files Required transducerZipformer transducer encoder.onnx, decoder.onnx, joiner.onnx, tokens.txt nemo_transducerNVIDIA NeMo transducer encoder.onnx, decoder.onnx, joiner.onnx, tokens.txt paraformerAlibaba Paraformer model.onnx, tokens.txt whisperOpenAI Whisper encoder.onnx, decoder.onnx, tokens.txt sense_voiceSenseVoice multilingual model.onnx, tokens.txt nemo_ctcNVIDIA NeMo CTC model.onnx, tokens.txt wenet_ctcWeNet CTC model.onnx, tokens.txt funasr_nanoFunASR Nano encoder_adaptor, llm, embedding, tokenizer moonshineMoonshine preprocess.onnx, encode.onnx, decode.onnx, tokens.txt dolphinDolphin model.onnx, tokens.txt canaryCanary multilingual encoder, decoder
Use modelType: 'auto' for automatic detection based on directory structure.
Model-Specific Options
Configure model-specific options via the modelOptions parameter:
Whisper
const stt = await createSTT ({
modelPath: { type: 'asset' , path: 'models/whisper-tiny' },
modelType: 'whisper' ,
modelOptions: {
whisper: {
language: 'en' , // ISO code: 'en', 'de', 'fr', etc.
task: 'transcribe' , // 'transcribe' or 'translate' (to English)
tailPaddings: 1000 ,
enableTokenTimestamps: true , // Android only
enableSegmentTimestamps: true , // Android only
},
},
});
Language codes: Use getWhisperLanguages() to get the full list of supported language objects { id, name }.
import { getWhisperLanguages } from 'react-native-sherpa-onnx/stt' ;
const languages = getWhisperLanguages ();
// [{ id: 'en', name: 'english' }, { id: 'de', name: 'german' }, ...]
SenseVoice
modelOptions : {
senseVoice : {
language : 'auto' , // 'auto', 'zh', 'en', 'yue', 'ja', 'ko'
useItn : true , // Inverse text normalization
},
}
Get supported languages:
import { getSenseVoiceLanguages } from 'react-native-sherpa-onnx/stt' ;
const languages = getSenseVoiceLanguages ();
Canary
modelOptions : {
canary : {
srcLang : 'en' , // 'en', 'es', 'de', 'fr'
tgtLang : 'en' ,
usePnc : true , // Use punctuation
},
}
FunASR Nano
modelOptions : {
funasrNano : {
language : '中文' , // '中文', '英文', '日文'
systemPrompt : 'Custom system prompt' ,
userPrompt : 'Custom user prompt' ,
maxNewTokens : 512 ,
temperature : 0.7 ,
topP : 0.95 ,
itn : true ,
hotwords : 'keyword1,keyword2' ,
},
}
Hotwords (Contextual Biasing)
Boost recognition of specific words or phrases. Only supported for transducer models (transducer, nemo_transducer).
import { sttSupportsHotwords } from 'react-native-sherpa-onnx/stt' ;
if ( sttSupportsHotwords ( 'transducer' )) {
const stt = await createSTT ({
modelPath: { type: 'asset' , path: 'models/zipformer-transducer' },
modelType: 'transducer' ,
hotwordsFile: '/path/to/hotwords.txt' ,
hotwordsScore: 1.5 ,
});
}
One phrase per line with optional boost score:
REACT NATIVE 2.0
SHERPA ONNX 1.8
MACHINE LEARNING
Runtime Config Updates
Update hotwords and decoding parameters without reloading:
await stt . setConfig ({
decodingMethod: 'modified_beam_search' ,
maxActivePaths: 4 ,
hotwordsFile: '/path/to/new-hotwords.txt' ,
hotwordsScore: 2.0 ,
blankPenalty: 0.0 ,
});
Advanced Configuration
const stt = await createSTT ({
modelPath: { type: 'asset' , path: 'models/whisper-tiny' },
modelType: 'auto' ,
numThreads: 4 , // Use multiple CPU threads
preferInt8: true , // Use quantized models for speed
});
Execution Providers
Accelerate inference with hardware backends:
const stt = await createSTT ({
modelPath: { type: 'asset' , path: 'models/whisper-tiny' },
modelType: 'auto' ,
provider: 'nnapi' , // 'cpu', 'nnapi' (Android), 'qnn', 'xnnpack'
});
Inverse Text Normalization (ITN)
Convert spoken forms to written forms (e.g., “twenty twenty four” → “2024”):
const stt = await createSTT ({
modelPath: { type: 'asset' , path: 'models/zipformer' },
modelType: 'transducer' ,
ruleFsts: '/path/to/rule1.fst,/path/to/rule2.fst' ,
ruleFars: '/path/to/rule.far' ,
});
Best Practices
Sample rate: Most models expect 16 kHz; some support 8/16/48 kHz
Channels: Mono (single channel)
Format: 16-bit PCM WAV
Pre-process: Use convertAudioToWav16k to ensure correct format
import { convertAudioToWav16k } from 'react-native-sherpa-onnx/audio' ;
const wavPath = await convertAudioToWav16k ( '/path/to/input.mp3' );
const result = await stt . transcribeFile ( wavPath );
Long Audio Files
For very long recordings, consider:
Splitting into smaller chunks to reduce memory usage
Using streaming STT for real-time processing
Processing in background to avoid blocking UI
Memory Management
// Always destroy when done
try {
const stt = await createSTT ( config );
const result = await stt . transcribeFile ( path );
return result ;
} finally {
await stt . destroy ();
}
Error Handling
try {
const stt = await createSTT ({
modelPath: { type: 'asset' , path: 'models/whisper-tiny' },
modelType: 'auto' ,
});
const result = await stt . transcribeFile ( '/path/to/audio.wav' );
console . log ( result . text );
await stt . destroy ();
} catch ( error ) {
if ( error . code === 'HOTWORDS_NOT_SUPPORTED' ) {
console . error ( 'This model does not support hotwords' );
} else {
console . error ( 'STT error:' , error . message );
}
}
Model Discovery
List available bundled models:
import { listAssetModels } from 'react-native-sherpa-onnx' ;
const models = await listAssetModels ();
const sttModels = models . filter ( m => m . hint === 'stt' );
console . log ( 'Available STT models:' , sttModels );
Next Steps
Streaming STT Real-time speech recognition with live transcription
Model Setup Download and configure STT models